98 modulation transfer indices, 7 octave bands, 14 modulation frequencies, and a single number between 0.00 and 1.00 — the Speech Transmission Index is the most mathematically rigorous and physically meaningful measure of speech intelligibility ever devised. Developed by Houtgast and Steeneken at TNO (Netherlands Organisation for Applied Scientific Research) in 1973, STI has become the international benchmark for evaluating whether speech can be understood in a given acoustic environment, referenced by IEC 60268-16:2020, ISO 3382-3:2012, WELL v2 Feature 74, DIN 18041:2016, and building codes in over 40 countries.
Yet STI remains one of the most misunderstood acoustic parameters in practice. It is frequently confused with word intelligibility percentage (they are related but not identical), conflated with RT60 (which is only one of the factors that affect STI), and measured incorrectly (the most common field measurement error reduces apparent STI by 0.05–0.15). This reference provides the complete technical foundation for understanding, calculating, measuring, and specifying STI.
Part 1: The Modulation Transfer Function — Theory
The Fundamental Insight
Speech is not a steady-state signal. It consists of rapid amplitude modulations — the syllables, consonants, and pauses that encode linguistic information. A speaker producing the sentence "The cat sat on the mat" generates amplitude modulations at rates between approximately 0.5 Hz (sentence-level rhythm) and 12.5 Hz (consonant-level articulation).
When speech propagates through a room, these amplitude modulations are degraded by two mechanisms:
- Reverberation smears temporal contrasts. Late reverberant energy fills the gaps between syllables, reducing the modulation depth. A room with RT60 = 3.0 seconds preserves virtually none of the rapid modulation that distinguishes "bat" from "pat."
- Background noise masks the quieter portions of the modulation (the gaps between syllables), reducing the apparent modulation depth. A signal-to-noise ratio of 0 dB means the quietest modulation valleys are equal in level to the noise — the listener cannot distinguish speech from silence.
m(F,k) = 1 / √(1 + (2πF × T(k) / 13.8)²) × 1 / (1 + 10^(-SNR(k)/10))
where T(k) is the reverberation time at octave band k, F is the modulation frequency (Hz), and SNR(k) is the signal-to-noise ratio at octave band k in decibels. The first term captures the reverberant degradation; the second captures the noise degradation. Both terms range from 0 to 1, and their product gives the combined modulation preservation.
The 98-Point MTF Matrix
IEC 60268-16:2020 §4.2 specifies 14 modulation frequencies: 0.63, 0.80, 1.00, 1.25, 1.60, 2.00, 2.50, 3.15, 4.00, 5.00, 6.30, 8.00, 10.0, and 12.50 Hz. These span the range of temporal modulations relevant to speech.
These are evaluated across 7 octave bands: 125, 250, 500, 1000, 2000, 4000, and 8000 Hz. The result is a 14 × 7 matrix of 98 modulation transfer indices, each ranging from 0 (complete modulation loss) to 1 (perfect modulation preservation).
From MTF to STI
The 98 MTF values are processed through five steps to produce the final STI score:
Step 1: Limit the apparent SNR. Each m(F,k) value is converted to an apparent signal-to-noise ratio: SNR_app = 10 × log₁₀(m / (1-m)). This is clipped to the range -15 dB to +15 dB.
Step 2: Average across modulation frequencies. For each octave band k, the 14 apparent SNR values are arithmetically averaged to produce a single Modulation Transfer Index (MTI) per band: MTI(k) = (1/14) × Σ SNR_app(F,k).
Step 3: Normalise to 0–1 range. Each MTI is normalised: TI(k) = (MTI(k) + 15) / 30.
Step 4: Apply octave band weighting. The seven TI values are combined using octave band weighting factors specified in IEC 60268-16 Table 3. The 2020 edition uses redundancy-corrected weights that account for the statistical correlation between adjacent octave bands:
| Octave Band (Hz) | 125 | 250 | 500 | 1000 | 2000 | 4000 | 8000 |
|---|---|---|---|---|---|---|---|
| Male weight (αk) | 0.085 | 0.127 | 0.230 | 0.233 | 0.309 | 0.224 | 0.173 |
| Male weight (βk) | 0.085 | 0.078 | 0.065 | 0.011 | 0.047 | 0.095 | — |
| Female weight (αk) | 0.000 | 0.117 | 0.223 | 0.216 | 0.328 | 0.250 | 0.194 |
| Female weight (βk) | 0.000 | 0.099 | 0.066 | 0.062 | 0.025 | 0.076 | — |
Step 5: Calculate final STI. STI = Σ(αk × TI(k)) − Σ(βk × √(TI(k) × TI(k+1)))
The subtracted term accounts for redundancy between adjacent bands — the information carried in the 500 Hz band is partially shared with the 1000 Hz band, so simply summing the weighted TI values would double-count some intelligibility.
Part 2: STIPA — The Simplified Field Method
Full STI measurement requires modulating a test signal at each of the 14 modulation frequencies individually in each of the 7 octave bands — a procedure that takes several minutes per source-receiver position. STIPA (Speech Transmission Index for Public Address) was developed as a practical field method that achieves comparable accuracy in approximately 15 seconds.
How STIPA Works
STIPA uses a specially designed test signal (defined in IEC 60268-16 Annex D) that simultaneously contains two modulation frequencies per octave band, for a total of 14 modulation-band pairs:
| Octave Band (Hz) | 125 | 250 | 500 | 1000 | 2000 | 4000 | 8000 |
|---|---|---|---|---|---|---|---|
| Modulation 1 (Hz) | 1.00 | 0.63 | 2.00 | 1.25 | 0.80 | 6.30 | 12.50 |
| Modulation 2 (Hz) | 5.00 | 3.15 | 10.00 | 8.00 | 4.00 | 2.50 | 1.60 |
The signal is played through a loudspeaker at the source position. At the receiver position, a STIPA analyser extracts the modulation depth at each of the 14 frequencies and calculates STI using the same weighting formula as full STI. Agreement between STIPA and full STI is typically within ±0.03 (IEC 60268-16 Annex F).
RASTI — The Withdrawn Predecessor
RASTI (Rapid Speech Transmission Index) was the original simplified method, using only two octave bands (500 Hz and 2000 Hz) with a total of 9 modulation frequencies. It was defined in the 1988 and 1998 editions of IEC 60268-16 but was withdrawn from the 2003 and subsequent editions because it did not adequately capture low-frequency and high-frequency effects on intelligibility. RASTI measurements from existing acoustic surveys should be treated as approximate. They cannot be directly compared with STIPA or full STI values.
Part 3: The STI Quality Scale
IEC 60268-16:2020 §4.6 defines the following quality categories. The CIS (Common Intelligibility Scale) mapping provides an approximate correspondence between STI and the percentage of words correctly identified in standardised articulation testing (modified rhyme test per ISO 8253-3).
| STI Range | IEC Quality | CIS Score | Approx. Word Intelligibility | Approx. Sentence Intelligibility |
|---|---|---|---|---|
| 0.00–0.30 | Bad | 0.00–0.31 | < 35% | < 50% |
| 0.30–0.45 | Poor | 0.31–0.52 | 35–55% | 50–75% |
| 0.45–0.60 | Fair | 0.52–0.72 | 55–75% | 75–90% |
| 0.60–0.75 | Good | 0.72–0.89 | 75–92% | 90–97% |
| 0.75–1.00 | Excellent | 0.89–1.00 | > 92% | > 97% |
Critical threshold: The transition from "Fair" to "Good" at STI = 0.60 is the most significant boundary in practice. Below 0.60, listeners expend measurable cognitive effort to decode speech, leading to fatigue, errors, and reduced learning outcomes (Crandell and Smaldino, 2000). Above 0.60, speech processing becomes largely automatic. This is why DIN 18041:2016 §4.4, the ASHA classroom recommendation, and most modern design guides use 0.60 as the minimum target for speech communication rooms.
Part 4: Factors Affecting STI
Factor 1: Reverberation Time
RT60 is the dominant factor in most rooms. The relationship is approximately:
STI ≈ 0.45 × (-log₁₀(RT60)) + 0.59 (for SNR > 25 dB, mid-frequency estimate)
This yields STI ≈ 0.72 at RT60 = 0.5 s, STI ≈ 0.59 at RT60 = 1.0 s, and STI ≈ 0.46 at RT60 = 2.0 s. The approximation shows that halving the RT60 improves STI by approximately 0.13 — a meaningful but not transformative improvement.
Factor 2: Signal-to-Noise Ratio
Background noise (from HVAC, traffic, adjacent rooms) directly reduces STI by masking the quiet portions of the speech modulation. The effect is:
- SNR > 25 dB: negligible impact on STI (noise term in MTF ≈ 1.0)
- SNR 15 dB: STI reduced by approximately 0.05–0.10
- SNR 10 dB: STI reduced by approximately 0.15–0.25
- SNR 0 dB: STI reduced by approximately 0.35–0.45
Factor 3: Early Reflections
Early reflections (arriving within 50 ms of the direct sound) can enhance STI because they reinforce the direct signal without smearing temporal modulation. A well-designed room with strong early lateral reflections and controlled late reverberation can achieve higher STI than a room with the same RT60 but weak early reflections. This effect is captured by the detailed MTF calculation but is obscured by the simplified RT60-to-STI approximation.
Factor 4: Source Distance
STI decreases with source-receiver distance. In a diffuse field, the critical distance (where direct and reverberant energy are equal) is:
r_c = 0.057 × √(V / (π × RT60))
Beyond r_c, the reverberant field dominates and STI is determined primarily by RT60 and background noise. For a 180 m³ classroom with RT60 = 0.6 s, r_c ≈ 0.057 × √(180 / (π × 0.6)) ≈ 0.57 m — meaning students beyond approximately 1 m from the teacher are in the reverberant field.
Factor 5: Electroacoustic Systems
Sound reinforcement systems can improve STI by increasing the direct-to-reverberant ratio at listener positions. However, poorly designed PA systems can reduce STI if they: (a) increase late reverberant energy through loudspeaker reflections, (b) introduce time delays between loudspeakers that produce comb-filtering effects, or (c) produce frequency response anomalies that distort speech spectra. IEC 60268-16 §4.5 includes provisions for measuring STI through electroacoustic systems.
Part 5: STI Targets by Room Type
| Room Type | STI Target | Source Standard | Notes |
|---|---|---|---|
| Classroom (standard) | ≥ 0.60 | DIN 18041 §4.4 | BB93 implies ≥ 0.60 via RT60/BNL limits |
| Classroom (hearing-impaired) | ≥ 0.70 | ASHA recommendation | DIN 18041 Group A+ implies ≥ 0.65 |
| Lecture hall | ≥ 0.60 | DIN 18041 §4.4 | With sound reinforcement if V > 500 m³ |
| Courtroom | ≥ 0.65 | BS 8233 (guidance) | Critical for legal proceedings |
| Meeting room | ≥ 0.60 | DIN 18041, WELL v2 F74 | Measured at furthest seat |
| Open-plan office (rD target) | STI < 0.50 at rD | ISO 3382-3 | Goal is privacy, not intelligibility |
| PA system (transport hub) | ≥ 0.50 | IEC 60268-16 §4.6 | Life safety announcements |
| PA system (railway platform) | ≥ 0.45 | EN 60849 (now EN 54-16) | Challenging due to open-air conditions |
| Emergency voice alarm | ≥ 0.50 | BS 5839-8 | UK fire alarm standard |
| Cinema | ≥ 0.55 | SMPTE ST 202 | Dialogue intelligibility |
Part 6: Worked Example — STI Prediction for a Classroom
Room: Primary school classroom, 9 m × 7 m × 3.0 m (V = 189 m³)
Step 1: Determine RT60 at each octave band.
After treatment with a Class A acoustic ceiling (25 mm mineral wool, αw 0.90) and 10 m² of wall panels, predicted RT60 values are:
| Octave Band (Hz) | 125 | 250 | 500 | 1000 | 2000 | 4000 |
|---|---|---|---|---|---|---|
| RT60 (s) | 0.85 | 0.62 | 0.48 | 0.45 | 0.42 | 0.40 |
Step 2: Determine SNR at each octave band.
Teacher speech level at 1 m (typical male, per IEC 60268-16 Table E.1): 53, 56, 62, 58, 53, 49 dBA per octave band. Background noise (HVAC at NC 30): 42, 36, 31, 28, 25, 22 dBA. At 4 m distance, direct sound attenuates by 12 dB. Reverberant field addition depends on room constant.
Approximate SNR at 4 m (including reverberant contribution): 5, 12, 22, 20, 17, 16 dB.
Step 3: Calculate MTF at a representative modulation frequency (2 Hz) for each octave band.
Using the MTF formula: m(2, k) = 1/√(1 + (2π × 2 × T(k)/13.8)²) × 1/(1 + 10^(-SNR(k)/10))
| Band (Hz) | 125 | 250 | 500 | 1000 | 2000 | 4000 |
|---|---|---|---|---|---|---|
| Reverb term | 0.74 | 0.84 | 0.90 | 0.92 | 0.93 | 0.94 |
| Noise term | 0.76 | 0.94 | 0.99 | 0.99 | 0.98 | 0.98 |
| m(2, k) | 0.56 | 0.79 | 0.89 | 0.91 | 0.91 | 0.92 |
Step 4: Repeat for all 14 modulation frequencies, average, weight, and combine.
The full calculation (performed computationally) yields: STI = 0.64 — within the "Good" category and above the 0.60 threshold required by DIN 18041 §4.4. The 125 Hz band is the weakest link (RT60 = 0.85 s and low SNR reduce that band's contribution significantly). Additional low-frequency absorption would improve STI by addressing this weakness.
Part 7: Measurement Equipment
Dedicated STIPA Analysers
| Instrument | Manufacturer | STIPA Accuracy | Additional Features | Approx. Cost |
|---|---|---|---|---|
| XL2 Sound Level Meter | NTi Audio | ±0.02 | Class 1 SLM, RT60, spectrum | £5,000–7,000 |
| AM100 | Bedrock Audio | ±0.02 | Dedicated STI/STIPA | £2,500–3,500 |
| Type 2270 | Brüel & Kjær | ±0.02 | Class 1 SLM, building acoustics suite | £12,000–18,000 |
| Digicheck by Gold Line | Gold Line | ±0.03 | Budget STIPA | £1,500–2,000 |
STIPA Signal Source
The STIPA test signal must conform to IEC 60268-16 Annex D. Pre-recorded WAV files are available from NTi Audio and Bedrock. The signal must be played through a loudspeaker that does not distort the modulation (THD < 3%), positioned at the typical talker height (1.5 m standing, 1.2 m seated) and oriented toward the primary listener area. The signal level should be calibrated to 60 dBA at 1 m (normal male speech level per IEC 60268-16 Table E.1).
Common Measurement Errors
- Source level too high or too low. If the source exceeds 70 dBA at 1 m, the SNR is artificially high and the measured STI will overestimate real-speech performance. If below 55 dBA, the signal may not overcome background noise.
- Measurement time too short. STIPA requires a minimum 15-second integration time for stable results. Shorter measurements produce statistical scatter of ±0.05.
- Source not at talker position. Placing the loudspeaker on a table (0.8 m) instead of at head height (1.5 m) changes the early reflection pattern and can shift STI by ±0.03.
- Background noise not representative. Measuring with HVAC off or with the room empty when the standard requires occupied/operational conditions.
Related Reading:
- The School Nobody Could Learn In: What ANSI S12.60 Failures Cost Students — an STI failure case study in a real school
- WELL v2 Feature 74 Acoustic Requirements Decoded — how STI fits into the WELL v2 acoustic framework
- Open Plan Office Acoustic Design: The Complete Guide — STI, rD, and ISO 3382-3 in open offices