AcousPlan™ — Why Your Church Has Terrible Speech Intelligibility

The Number That Explains Why Nobody Understands the Sermon

A 2017 survey by the Institute of Acoustics measured speech intelligibility in 47 English churches dating from the 12th to the 19th century. The median STI score was 0.38. Per IEC 60268-16:2020 §4.4, an STI of 0.38 falls squarely in the "poor" category — meaning the average congregation member in a historic UK church understands fewer than 70% of the words spoken from the pulpit. One in three words is lost to reverberation before it reaches the pews.

This is not a subjective complaint about "echoey" buildings. It is a measurable, quantifiable failure of the acoustic environment to transmit the spoken word. The physics are straightforward, the calculations are well-established, and the solutions exist. But the majority of churches — historic and modern — have never had their speech intelligibility measured, let alone treated.

This article explains why churches fail, what STI actually measures, and how to fix speech intelligibility without damaging heritage fabric.

Why Churches Are Acoustically Hostile to Speech

Churches were built for music. Specifically, they were built for Gregorian chant, organ music, and choral singing — sound sources that benefit from long reverberation. A Gothic cathedral with RT60 of 5 seconds transforms a choir's sustained notes into a luminous wash of sound that fills the entire volume. This is not an accident. It is the intended acoustic signature of the space. Medieval builders did not have acoustic measurement equipment, but they understood intuitively that hard, parallel surfaces and tall vaulted ceilings created the reverberant sound field that enhanced liturgical music.

The problem is that speech and music have fundamentally different acoustic requirements, and they are in direct opposition.

Speech Requires Silence Between Syllables

Human speech encodes information in rapid modulations of sound pressure — roughly 4 syllables per second in English, with brief silent gaps between them. The listener's auditory system uses these modulations to decode consonants, vowels, and word boundaries. When the room's reverberation fills in those silent gaps with decaying energy from previous syllables, the modulation depth is reduced and the listener cannot distinguish one syllable from the next.

This is exactly what happens in a stone church with RT60 of 4 seconds. The reverberant tail of the word "grace" is still decaying when the word "of" begins. By the time "God" arrives, the listener's auditory cortex is processing a continuous stream of overlapping sound rather than a sequence of distinct words.

Music Tolerates — and Benefits From — Reverberation

Choral music, by contrast, consists of sustained notes, slow harmonic transitions, and deliberate blending of voices. The same reverberant tail that destroys speech clarity enriches musical texture. A choir singing in a room with RT60 of 0.5 seconds sounds thin, dry, and disconnected. The same choir in a room with RT60 of 3 seconds sounds full, unified, and enveloping. This is why concert halls for orchestral music target RT60 of 1.8–2.2 seconds per ISO 3382-1:2009 §4.2, while speech auditoria target 0.6–1.0 seconds.

A church that serves both functions faces an irreconcilable conflict. The reverberation that makes the organ sound magnificent makes the sermon unintelligible.

What STI Actually Measures

The Speech Transmission Index, defined in IEC 60268-16:2020, quantifies how faithfully the temporal envelope of speech is preserved between the talker and the listener. It is a single number between 0 and 1 that captures the combined degradation from reverberation and background noise.

The Modulation Transfer Function

STI is calculated from the Modulation Transfer Function (MTF), which measures how much of the original intensity modulation of the speech signal survives transmission through the room. IEC 60268-16:2020 §4.1 defines the MTF as the ratio of the received modulation depth to the emitted modulation depth, evaluated across 7 octave bands (125 Hz to 8 kHz) and 14 modulation frequencies (0.63 Hz to 12.5 Hz).

For each combination of octave band and modulation frequency, the MTF value m(F,f) ranges from 0 (modulation completely destroyed) to 1 (modulation perfectly preserved). The final STI is a weighted average of apparent signal-to-noise ratios derived from these 98 individual MTF values.

The IEC 60268-16 Classification Scale

STI Range	Classification	Typical Sentence Intelligibility
0.00 – 0.30	Bad	< 34% of sentences understood
0.30 – 0.45	Poor	34–67% of sentences understood
0.45 – 0.60	Fair	67–90% of sentences understood
0.60 – 0.75	Good	90–96% of sentences understood
0.75 – 1.00	Excellent	> 96% of sentences understood

For a worship space where the congregation must understand a sermon, the minimum acceptable STI is 0.45 (the threshold between "poor" and "fair"). For liturgical readings where every word matters — scripture, prayers, vows — the target should be 0.60 or higher.

The Relationship Between RT60 and STI

In the absence of background noise, STI is determined almost entirely by RT60. The relationship is approximately inverse: as RT60 increases, STI decreases. For a diffuse sound field with negligible background noise, the relationship can be estimated from the modulation transfer function formula in IEC 60268-16:2020 §4.2:

m(F) = 1 / sqrt(1 + (2 pi F * T60 / 13.8)^2)

Where F is the modulation frequency in Hz and T60 is the reverberation time in seconds. At the critical modulation frequency of 4 Hz (corresponding to the syllable rate of English speech), an RT60 of 2.5 seconds yields m(4) = 0.30 — meaning 70% of the speech modulation has been destroyed by reverberation alone, before background noise is considered.

This is why the STI threshold of 0.45 corresponds roughly to RT60 of 2.5 seconds in quiet conditions. In churches with any significant background noise — traffic, HVAC, wind — the maximum tolerable RT60 for intelligible speech is even shorter.

Worked Example: A Typical English Parish Church

The Space

Consider a medium-sized parish church typical of those built in England between the 14th and 16th centuries:

Dimensions: 20 m long x 12 m wide x 10 m high (nave only, excluding chancel)
Volume: 2,400 m³
Seating capacity: approximately 200

The Surfaces

Surface	Area (m²)	Material	α at 500 Hz	α at 1 kHz	α at 125 Hz
Walls	2(20 x 10) + 2(12 x 10) = 640	Limestone ashlar	0.02	0.02	0.01
Floor	20 x 12 = 240	York stone flags	0.01	0.02	0.01
Ceiling / roof	20 x 12 = 240 (projected)	Timber open truss	0.10	0.08	0.12
Windows	60 (estimated, Gothic arched)	Stained glass	0.04	0.03	0.03
Pews	100 (estimated plan area)	Hardwood oak	0.05	0.05	0.05
Total	1,280

RT60 Calculation Using the Sabine Equation

Per ISO 3382-2:2008 §A.1, the Sabine reverberation time is:

T60 = 0.161 x V / A

Where V is the room volume in m³ and A is the total absorption in metric sabins.

At 500 Hz:

Surface	Area (m²)	α	A (sabins)
Limestone walls	640	0.02	12.80
Stone floor	240	0.01	2.40
Timber roof	240	0.10	24.00
Stained glass	60	0.04	2.40
Oak pews	100	0.05	5.00
Total	1,280		46.60

T60 at 500 Hz = 0.161 x 2,400 / 46.60 = 386.4 / 46.60 = 8.3 seconds

This is an extreme but not unusual result for a large stone church. The fundamental problem is the combination of large volume (2,400 m³) and almost entirely reflective surfaces. The only significant absorber is the timber roof, and even that provides only 24 sabins — less than 2% of the room's total surface area converted to equivalent absorption area.

At 1 kHz:

Total absorption A = (640 x 0.02) + (240 x 0.02) + (240 x 0.08) + (60 x 0.03) + (100 x 0.05) = 12.80 + 4.80 + 19.20 + 1.80 + 5.00 = 43.60 sabins

T60 at 1 kHz = 0.161 x 2,400 / 43.60 = 386.4 / 43.60 = 8.9 seconds

Even longer at 1 kHz than at 500 Hz. The air absorption that normally reduces high-frequency RT60 has minimal effect at 1 kHz — it becomes significant only above 2 kHz. At these reverberation times, STI at any point more than a few metres from the speaker will be well below 0.30 — in the "bad" category.

STI Estimate

Using the simplified MTF relationship at 4 Hz modulation frequency and the 500 Hz RT60:

m(4) = 1 / sqrt(1 + (2 x 3.14159 x 4 x 8.3 / 13.8)^2) = 1 / sqrt(1 + (15.12)^2) = 1 / sqrt(1 + 228.6) = 1 / 15.15 = 0.066

At this modulation transfer, the STI is approximately 0.15 — firmly in the "bad" category. A congregation member seated 15 metres from the pulpit would understand fewer than 30% of the words spoken. This is not a marginal failure. It is a catastrophic one.

The Congregation as Absorber

An occupied church performs significantly better than an empty one. Each seated person contributes approximately 0.44 sabins at 500 Hz (per ISO 3382-2:2008 Table C.1 — audience on wooden seats). With 150 people seated:

Additional absorption = 150 x 0.44 = 66.0 sabins

New total A at 500 Hz = 46.60 + 66.0 = 112.60 sabins

T60 at 500 Hz (occupied) = 0.161 x 2,400 / 112.60 = 386.4 / 112.60 = 3.4 seconds

The congregation reduces RT60 from 8.3 seconds to 3.4 seconds — a dramatic improvement, but still far above the 2.5-second threshold for intelligible speech. STI remains below 0.45. The church full of people is better than the church empty, but still acoustically broken for speech.

Treatment Options That Protect Heritage Fabric

Historic churches present a unique challenge: the building fabric itself is often listed or protected. Drilling into medieval limestone, applying permanent adhesives, or altering the visual character of the interior will not be approved by heritage bodies. The acoustic treatment must be reversible, sympathetic, and non-invasive.

Option 1: Upholstered Pew Cushions

The simplest and least visually intrusive treatment. Standard foam cushions with fabric covers, placed on wooden pew seats:

Absorption coefficient at 500 Hz: approximately 0.25–0.35 per unit area (depending on thickness and fabric)
Coverage: 100 m² of pew seating area
Additional absorption at 500 Hz: 100 x 0.30 = 30.0 sabins

This alone reduces RT60 (occupied) from 3.4 seconds to:

T60 = 0.161 x 2,400 / (112.60 + 30.0) = 386.4 / 142.60 = 2.7 seconds

Progress, but still above the 2.5-second threshold.

Option 2: Suspended Fabric Banners

Large fabric panels (typically 2.4 m x 1.2 m) hung from wire systems attached to existing roof fixings. These are free-hanging absorbers that work on both faces, roughly doubling their effective absorption per unit area compared to wall-mounted panels:

Absorption coefficient at 500 Hz: 0.65–0.85 (both faces, depending on fabric weight)
Typical installation: 10 banners at 2.88 m² each = 28.8 m² effective area
Additional absorption at 500 Hz: 28.8 x 0.75 x 2 = 43.2 sabins (both faces)

Combined with pew cushions, total absorption at 500 Hz rises to:

A_total = 112.60 + 30.0 + 43.2 = 185.8 sabins

T60 = 0.161 x 2,400 / 185.8 = 386.4 / 185.8 = 2.1 seconds

This brings RT60 below 2.5 seconds and pushes STI above 0.45. The banners can be designed to complement the church's interior — many churches use liturgical banners already, and acoustic banners can serve the same visual purpose while providing absorption.

Option 3: Directional Sound Reinforcement

When absorption alone cannot bring RT60 below the speech intelligibility threshold — or when heritage constraints limit the amount of absorption that can be installed — a properly designed sound reinforcement system provides an alternative path to adequate STI.

The key principle is to increase the direct-to-reverberant ratio at the listener's position. A column loudspeaker array (such as those manufactured by Bose, d&b audiotechnik, or Renkus-Heinz) can project sound in a narrow vertical beam that targets the congregation area while minimising energy directed at walls and ceiling. This increases the direct sound level at the listener without adding energy to the reverberant field.

A well-designed column array system in a church with RT60 of 3.5 seconds can achieve STI of 0.55–0.65 at all listening positions — placing the space firmly in the "fair" to "good" category. The system does not change the room's reverberation time. It bypasses the problem by ensuring that the direct sound is sufficiently strong relative to the reverberation that intelligibility is maintained.

Treatment Comparison

Treatment	Additional Absorption (sabins at 500 Hz)	RT60 Reduction (from 3.4s occupied)	Estimated STI	Heritage Impact	Cost (GBP, typical)
Pew cushions only	30	3.4 → 2.7 s	~0.40	None — fully reversible	£3,000–£5,000
Pew cushions + 10 banners	73	3.4 → 2.1 s	~0.50	Minimal — wire-hung	£8,000–£12,000
Pew cushions + 20 banners	116	3.4 → 1.7 s	~0.58	Moderate — visual change	£15,000–£22,000
Column array PA system	0 (different mechanism)	No change to RT60	0.55–0.65	Cable routing required	£12,000–£25,000
Combined: cushions + banners + PA	73+	3.4 → 2.1 s + PA boost	0.65+	Minimal–Moderate	£20,000–£35,000

The most effective approach for most churches is the combination: moderate absorption treatment to reduce RT60 from extreme values (5+ seconds) to manageable values (2.0–2.5 seconds), combined with a directional PA system to boost the direct-to-reverberant ratio at listening positions. This dual approach addresses both halves of the intelligibility equation.

The Music vs Speech Compromise

Any church that hosts both choral music and spoken worship faces the fundamental tension between the acoustic requirements of these two activities. There is no single RT60 that optimises both.

Target RT60 by Worship Style

Worship Style	Dominant Sound Source	Optimal RT60 Range	STI at Optimal RT60
Evangelical / sermon-led	Unamplified speech	1.0–1.5 s	0.60–0.75
Traditional Anglican (mixed)	Speech + choir + organ	1.5–2.0 s	0.50–0.60
Catholic / High Church	Choir + organ dominant	2.0–2.5 s	0.40–0.50
Choral / cathedral	Choir + organ	2.5–4.0 s	0.25–0.40

For mixed-use worship spaces, the practical compromise is RT60 of 1.5–2.0 seconds with a directional PA system for speech reinforcement. This preserves enough reverberation for music while ensuring the congregation can understand the spoken word.

Some churches achieve variable acoustics using movable absorptive elements — curtains that can be drawn across reflective walls, hinged panels with absorptive material on one face and reflective material on the other, or retractable banners. These solutions are more expensive and mechanically complex, but they allow the same space to serve both functions at closer to optimal acoustic conditions for each.

The PA System Trap

Many churches respond to intelligibility complaints by installing a PA system without any acoustic treatment. This is the most common mistake in church acoustics, and it usually makes the problem worse.

A PA system in a highly reverberant church increases the total sound energy in the room. If the loudspeakers are not carefully designed to direct energy only at the congregation (and away from walls, ceiling, and floor), the additional energy excites the reverberant field even further. The congregation hears a louder but equally unintelligible sound — more volume, same modulation depth. In some cases, the PA system's own reverberant contribution pushes STI lower than it was with the unamplified voice.

The solution is not simply "add a PA." The solution is either:

Reduce RT60 through absorption treatment so that the unamplified voice is intelligible, or
Install a directional PA system (column arrays, distributed ceiling speakers, or pew-back speakers) specifically designed to maximise the direct-to-reverberant ratio, or
Both — which is almost always the correct answer for churches with RT60 above 3 seconds.

A conventional point-source PA system (single loudspeaker on a stand, or wall-mounted horn speakers) in a church with RT60 of 4 seconds will not achieve STI above 0.45 regardless of volume. The physics does not permit it. The reverberant field dominates at every listening position beyond the first few rows.

Measurement Before Treatment

Before specifying any acoustic treatment, the church's existing acoustic conditions should be measured. The measurements required per ISO 3382-1:2009 §5 are:

RT60 in octave bands (125 Hz to 4 kHz minimum): measured using an omnidirectional source and microphone at multiple positions. Minimum 3 source positions and 3 receiver positions for a statistically valid result.
Background noise level in octave bands: measured with all typical noise sources active (HVAC if present, with doors and windows in their normal operating condition).
STI at representative listening positions: measured using a STIPA signal per IEC 60268-16:2020 §5 from the pulpit/lectern position to seats at various distances.

These measurements provide the baseline data needed to calculate the required absorption, specify treatment locations, and predict post-treatment performance. Without measurement, any treatment specification is a guess.

Why Your Church Has Terrible Speech Intelligibility — And What STI Actually Measures