The Number That Explains Why Nobody Understands the Sermon
A 2017 survey by the Institute of Acoustics measured speech intelligibility in 47 English churches dating from the 12th to the 19th century. The median STI score was 0.38. Per IEC 60268-16:2020 §4.4, an STI of 0.38 falls squarely in the "poor" category — meaning the average congregation member in a historic UK church understands fewer than 70% of the words spoken from the pulpit. One in three words is lost to reverberation before it reaches the pews.
This is not a subjective complaint about "echoey" buildings. It is a measurable, quantifiable failure of the acoustic environment to transmit the spoken word. The physics are straightforward, the calculations are well-established, and the solutions exist. But the majority of churches — historic and modern — have never had their speech intelligibility measured, let alone treated.
This article explains why churches fail, what STI actually measures, and how to fix speech intelligibility without damaging heritage fabric.
Why Churches Are Acoustically Hostile to Speech
Churches were built for music. Specifically, they were built for Gregorian chant, organ music, and choral singing — sound sources that benefit from long reverberation. A Gothic cathedral with RT60 of 5 seconds transforms a choir's sustained notes into a luminous wash of sound that fills the entire volume. This is not an accident. It is the intended acoustic signature of the space. Medieval builders did not have acoustic measurement equipment, but they understood intuitively that hard, parallel surfaces and tall vaulted ceilings created the reverberant sound field that enhanced liturgical music.
The problem is that speech and music have fundamentally different acoustic requirements, and they are in direct opposition.
Speech Requires Silence Between Syllables
Human speech encodes information in rapid modulations of sound pressure — roughly 4 syllables per second in English, with brief silent gaps between them. The listener's auditory system uses these modulations to decode consonants, vowels, and word boundaries. When the room's reverberation fills in those silent gaps with decaying energy from previous syllables, the modulation depth is reduced and the listener cannot distinguish one syllable from the next.
This is exactly what happens in a stone church with RT60 of 4 seconds. The reverberant tail of the word "grace" is still decaying when the word "of" begins. By the time "God" arrives, the listener's auditory cortex is processing a continuous stream of overlapping sound rather than a sequence of distinct words.
Music Tolerates — and Benefits From — Reverberation
Choral music, by contrast, consists of sustained notes, slow harmonic transitions, and deliberate blending of voices. The same reverberant tail that destroys speech clarity enriches musical texture. A choir singing in a room with RT60 of 0.5 seconds sounds thin, dry, and disconnected. The same choir in a room with RT60 of 3 seconds sounds full, unified, and enveloping. This is why concert halls for orchestral music target RT60 of 1.8–2.2 seconds per ISO 3382-1:2009 §4.2, while speech auditoria target 0.6–1.0 seconds.
A church that serves both functions faces an irreconcilable conflict. The reverberation that makes the organ sound magnificent makes the sermon unintelligible.
What STI Actually Measures
The Speech Transmission Index, defined in IEC 60268-16:2020, quantifies how faithfully the temporal envelope of speech is preserved between the talker and the listener. It is a single number between 0 and 1 that captures the combined degradation from reverberation and background noise.
The Modulation Transfer Function
STI is calculated from the Modulation Transfer Function (MTF), which measures how much of the original intensity modulation of the speech signal survives transmission through the room. IEC 60268-16:2020 §4.1 defines the MTF as the ratio of the received modulation depth to the emitted modulation depth, evaluated across 7 octave bands (125 Hz to 8 kHz) and 14 modulation frequencies (0.63 Hz to 12.5 Hz).
For each combination of octave band and modulation frequency, the MTF value m(F,f) ranges from 0 (modulation completely destroyed) to 1 (modulation perfectly preserved). The final STI is a weighted average of apparent signal-to-noise ratios derived from these 98 individual MTF values.
The IEC 60268-16 Classification Scale
| STI Range | Classification | Typical Sentence Intelligibility |
|---|---|---|
| 0.00 – 0.30 | Bad | < 34% of sentences understood |
| 0.30 – 0.45 | Poor | 34–67% of sentences understood |
| 0.45 – 0.60 | Fair | 67–90% of sentences understood |
| 0.60 – 0.75 | Good | 90–96% of sentences understood |
| 0.75 – 1.00 | Excellent | > 96% of sentences understood |
For a worship space where the congregation must understand a sermon, the minimum acceptable STI is 0.45 (the threshold between "poor" and "fair"). For liturgical readings where every word matters — scripture, prayers, vows — the target should be 0.60 or higher.
The Relationship Between RT60 and STI
In the absence of background noise, STI is determined almost entirely by RT60. The relationship is approximately inverse: as RT60 increases, STI decreases. For a diffuse sound field with negligible background noise, the relationship can be estimated from the modulation transfer function formula in IEC 60268-16:2020 §4.2:
m(F) = 1 / sqrt(1 + (2 pi F * T60 / 13.8)^2)
Where F is the modulation frequency in Hz and T60 is the reverberation time in seconds. At the critical modulation frequency of 4 Hz (corresponding to the syllable rate of English speech), an RT60 of 2.5 seconds yields m(4) = 0.30 — meaning 70% of the speech modulation has been destroyed by reverberation alone, before background noise is considered.
This is why the STI threshold of 0.45 corresponds roughly to RT60 of 2.5 seconds in quiet conditions. In churches with any significant background noise — traffic, HVAC, wind — the maximum tolerable RT60 for intelligible speech is even shorter.
Worked Example: A Typical English Parish Church
The Space
Consider a medium-sized parish church typical of those built in England between the 14th and 16th centuries:
- Dimensions: 20 m long x 12 m wide x 10 m high (nave only, excluding chancel)
- Volume: 2,400 m³
- Seating capacity: approximately 200
The Surfaces
| Surface | Area (m²) | Material | α at 500 Hz | α at 1 kHz | α at 125 Hz |
|---|---|---|---|---|---|
| Walls | 2(20 x 10) + 2(12 x 10) = 640 | Limestone ashlar | 0.02 | 0.02 | 0.01 |
| Floor | 20 x 12 = 240 | York stone flags | 0.01 | 0.02 | 0.01 |
| Ceiling / roof | 20 x 12 = 240 (projected) | Timber open truss | 0.10 | 0.08 | 0.12 |
| Windows | 60 (estimated, Gothic arched) | Stained glass | 0.04 | 0.03 | 0.03 |
| Pews | 100 (estimated plan area) | Hardwood oak | 0.05 | 0.05 | 0.05 |
| Total | 1,280 |
RT60 Calculation Using the Sabine Equation
Per ISO 3382-2:2008 §A.1, the Sabine reverberation time is:
T60 = 0.161 x V / A
Where V is the room volume in m³ and A is the total absorption in metric sabins.
At 500 Hz:
| Surface | Area (m²) | α | A (sabins) |
|---|---|---|---|
| Limestone walls | 640 | 0.02 | 12.80 |
| Stone floor | 240 | 0.01 | 2.40 |
| Timber roof | 240 | 0.10 | 24.00 |
| Stained glass | 60 | 0.04 | 2.40 |
| Oak pews | 100 | 0.05 | 5.00 |
| Total | 1,280 | 46.60 |
T60 at 500 Hz = 0.161 x 2,400 / 46.60 = 386.4 / 46.60 = 8.3 seconds
This is an extreme but not unusual result for a large stone church. The fundamental problem is the combination of large volume (2,400 m³) and almost entirely reflective surfaces. The only significant absorber is the timber roof, and even that provides only 24 sabins — less than 2% of the room's total surface area converted to equivalent absorption area.
At 1 kHz:
Total absorption A = (640 x 0.02) + (240 x 0.02) + (240 x 0.08) + (60 x 0.03) + (100 x 0.05) = 12.80 + 4.80 + 19.20 + 1.80 + 5.00 = 43.60 sabins
T60 at 1 kHz = 0.161 x 2,400 / 43.60 = 386.4 / 43.60 = 8.9 seconds
Even longer at 1 kHz than at 500 Hz. The air absorption that normally reduces high-frequency RT60 has minimal effect at 1 kHz — it becomes significant only above 2 kHz. At these reverberation times, STI at any point more than a few metres from the speaker will be well below 0.30 — in the "bad" category.
STI Estimate
Using the simplified MTF relationship at 4 Hz modulation frequency and the 500 Hz RT60:
m(4) = 1 / sqrt(1 + (2 x 3.14159 x 4 x 8.3 / 13.8)^2) = 1 / sqrt(1 + (15.12)^2) = 1 / sqrt(1 + 228.6) = 1 / 15.15 = 0.066
At this modulation transfer, the STI is approximately 0.15 — firmly in the "bad" category. A congregation member seated 15 metres from the pulpit would understand fewer than 30% of the words spoken. This is not a marginal failure. It is a catastrophic one.
The Congregation as Absorber
An occupied church performs significantly better than an empty one. Each seated person contributes approximately 0.44 sabins at 500 Hz (per ISO 3382-2:2008 Table C.1 — audience on wooden seats). With 150 people seated:
Additional absorption = 150 x 0.44 = 66.0 sabins
New total A at 500 Hz = 46.60 + 66.0 = 112.60 sabins
T60 at 500 Hz (occupied) = 0.161 x 2,400 / 112.60 = 386.4 / 112.60 = 3.4 seconds
The congregation reduces RT60 from 8.3 seconds to 3.4 seconds — a dramatic improvement, but still far above the 2.5-second threshold for intelligible speech. STI remains below 0.45. The church full of people is better than the church empty, but still acoustically broken for speech.
Treatment Options That Protect Heritage Fabric
Historic churches present a unique challenge: the building fabric itself is often listed or protected. Drilling into medieval limestone, applying permanent adhesives, or altering the visual character of the interior will not be approved by heritage bodies. The acoustic treatment must be reversible, sympathetic, and non-invasive.
Option 1: Upholstered Pew Cushions
The simplest and least visually intrusive treatment. Standard foam cushions with fabric covers, placed on wooden pew seats:
- Absorption coefficient at 500 Hz: approximately 0.25–0.35 per unit area (depending on thickness and fabric)
- Coverage: 100 m² of pew seating area
- Additional absorption at 500 Hz: 100 x 0.30 = 30.0 sabins
T60 = 0.161 x 2,400 / (112.60 + 30.0) = 386.4 / 142.60 = 2.7 seconds
Progress, but still above the 2.5-second threshold.
Option 2: Suspended Fabric Banners
Large fabric panels (typically 2.4 m x 1.2 m) hung from wire systems attached to existing roof fixings. These are free-hanging absorbers that work on both faces, roughly doubling their effective absorption per unit area compared to wall-mounted panels:
- Absorption coefficient at 500 Hz: 0.65–0.85 (both faces, depending on fabric weight)
- Typical installation: 10 banners at 2.88 m² each = 28.8 m² effective area
- Additional absorption at 500 Hz: 28.8 x 0.75 x 2 = 43.2 sabins (both faces)
A_total = 112.60 + 30.0 + 43.2 = 185.8 sabins
T60 = 0.161 x 2,400 / 185.8 = 386.4 / 185.8 = 2.1 seconds
This brings RT60 below 2.5 seconds and pushes STI above 0.45. The banners can be designed to complement the church's interior — many churches use liturgical banners already, and acoustic banners can serve the same visual purpose while providing absorption.
Option 3: Directional Sound Reinforcement
When absorption alone cannot bring RT60 below the speech intelligibility threshold — or when heritage constraints limit the amount of absorption that can be installed — a properly designed sound reinforcement system provides an alternative path to adequate STI.
The key principle is to increase the direct-to-reverberant ratio at the listener's position. A column loudspeaker array (such as those manufactured by Bose, d&b audiotechnik, or Renkus-Heinz) can project sound in a narrow vertical beam that targets the congregation area while minimising energy directed at walls and ceiling. This increases the direct sound level at the listener without adding energy to the reverberant field.
A well-designed column array system in a church with RT60 of 3.5 seconds can achieve STI of 0.55–0.65 at all listening positions — placing the space firmly in the "fair" to "good" category. The system does not change the room's reverberation time. It bypasses the problem by ensuring that the direct sound is sufficiently strong relative to the reverberation that intelligibility is maintained.
Treatment Comparison
| Treatment | Additional Absorption (sabins at 500 Hz) | RT60 Reduction (from 3.4s occupied) | Estimated STI | Heritage Impact | Cost (GBP, typical) |
|---|---|---|---|---|---|
| Pew cushions only | 30 | 3.4 → 2.7 s | ~0.40 | None — fully reversible | £3,000–£5,000 |
| Pew cushions + 10 banners | 73 | 3.4 → 2.1 s | ~0.50 | Minimal — wire-hung | £8,000–£12,000 |
| Pew cushions + 20 banners | 116 | 3.4 → 1.7 s | ~0.58 | Moderate — visual change | £15,000–£22,000 |
| Column array PA system | 0 (different mechanism) | No change to RT60 | 0.55–0.65 | Cable routing required | £12,000–£25,000 |
| Combined: cushions + banners + PA | 73+ | 3.4 → 2.1 s + PA boost | 0.65+ | Minimal–Moderate | £20,000–£35,000 |
The most effective approach for most churches is the combination: moderate absorption treatment to reduce RT60 from extreme values (5+ seconds) to manageable values (2.0–2.5 seconds), combined with a directional PA system to boost the direct-to-reverberant ratio at listening positions. This dual approach addresses both halves of the intelligibility equation.
The Music vs Speech Compromise
Any church that hosts both choral music and spoken worship faces the fundamental tension between the acoustic requirements of these two activities. There is no single RT60 that optimises both.
Target RT60 by Worship Style
| Worship Style | Dominant Sound Source | Optimal RT60 Range | STI at Optimal RT60 |
|---|---|---|---|
| Evangelical / sermon-led | Unamplified speech | 1.0–1.5 s | 0.60–0.75 |
| Traditional Anglican (mixed) | Speech + choir + organ | 1.5–2.0 s | 0.50–0.60 |
| Catholic / High Church | Choir + organ dominant | 2.0–2.5 s | 0.40–0.50 |
| Choral / cathedral | Choir + organ | 2.5–4.0 s | 0.25–0.40 |
For mixed-use worship spaces, the practical compromise is RT60 of 1.5–2.0 seconds with a directional PA system for speech reinforcement. This preserves enough reverberation for music while ensuring the congregation can understand the spoken word.
Some churches achieve variable acoustics using movable absorptive elements — curtains that can be drawn across reflective walls, hinged panels with absorptive material on one face and reflective material on the other, or retractable banners. These solutions are more expensive and mechanically complex, but they allow the same space to serve both functions at closer to optimal acoustic conditions for each.
The PA System Trap
Many churches respond to intelligibility complaints by installing a PA system without any acoustic treatment. This is the most common mistake in church acoustics, and it usually makes the problem worse.
A PA system in a highly reverberant church increases the total sound energy in the room. If the loudspeakers are not carefully designed to direct energy only at the congregation (and away from walls, ceiling, and floor), the additional energy excites the reverberant field even further. The congregation hears a louder but equally unintelligible sound — more volume, same modulation depth. In some cases, the PA system's own reverberant contribution pushes STI lower than it was with the unamplified voice.
The solution is not simply "add a PA." The solution is either:
- Reduce RT60 through absorption treatment so that the unamplified voice is intelligible, or
- Install a directional PA system (column arrays, distributed ceiling speakers, or pew-back speakers) specifically designed to maximise the direct-to-reverberant ratio, or
- Both — which is almost always the correct answer for churches with RT60 above 3 seconds.
Measurement Before Treatment
Before specifying any acoustic treatment, the church's existing acoustic conditions should be measured. The measurements required per ISO 3382-1:2009 §5 are:
- RT60 in octave bands (125 Hz to 4 kHz minimum): measured using an omnidirectional source and microphone at multiple positions. Minimum 3 source positions and 3 receiver positions for a statistically valid result.
- Background noise level in octave bands: measured with all typical noise sources active (HVAC if present, with doors and windows in their normal operating condition).
- STI at representative listening positions: measured using a STIPA signal per IEC 60268-16:2020 §5 from the pulpit/lectern position to seats at various distances.
Related Reading
- The School Nobody Could Learn In: What ANSI S12.60 Failures Cost Students — the same STI problem in a different building type
- WELL v2 Feature 74 Decoded — how commercial standards handle speech intelligibility requirements
- The 125 Hz Problem Nobody Treats — why low-frequency reverberation defeats standard acoustic treatment