PILLAR14 min read

Speech Transmission Index (STI) — The Complete Technical Reference

The definitive technical reference for STI (Speech Transmission Index) covering MTF theory, the full 98-point calculation method per IEC 60268-16:2020, STIPA and RASTI simplified methods, the CIS intelligibility scale, measurement equipment and methodology, STI targets by room type, and the relationship between RT60, background noise, and speech intelligibility.

AcousPlan Editorial · March 14, 2026

98 modulation transfer indices, 7 octave bands, 14 modulation frequencies, and a single number between 0.00 and 1.00 — the Speech Transmission Index is the most mathematically rigorous and physically meaningful measure of speech intelligibility ever devised. Developed by Houtgast and Steeneken at TNO (Netherlands Organisation for Applied Scientific Research) in 1973, STI has become the international benchmark for evaluating whether speech can be understood in a given acoustic environment, referenced by IEC 60268-16:2020, ISO 3382-3:2012, WELL v2 Feature 74, DIN 18041:2016, and building codes in over 40 countries.

Yet STI remains one of the most misunderstood acoustic parameters in practice. It is frequently confused with word intelligibility percentage (they are related but not identical), conflated with RT60 (which is only one of the factors that affect STI), and measured incorrectly (the most common field measurement error reduces apparent STI by 0.05–0.15). This reference provides the complete technical foundation for understanding, calculating, measuring, and specifying STI.

Part 1: The Modulation Transfer Function — Theory

The Fundamental Insight

Speech is not a steady-state signal. It consists of rapid amplitude modulations — the syllables, consonants, and pauses that encode linguistic information. A speaker producing the sentence "The cat sat on the mat" generates amplitude modulations at rates between approximately 0.5 Hz (sentence-level rhythm) and 12.5 Hz (consonant-level articulation).

When speech propagates through a room, these amplitude modulations are degraded by two mechanisms:

  1. Reverberation smears temporal contrasts. Late reverberant energy fills the gaps between syllables, reducing the modulation depth. A room with RT60 = 3.0 seconds preserves virtually none of the rapid modulation that distinguishes "bat" from "pat."
  1. Background noise masks the quieter portions of the modulation (the gaps between syllables), reducing the apparent modulation depth. A signal-to-noise ratio of 0 dB means the quietest modulation valleys are equal in level to the noise — the listener cannot distinguish speech from silence.
The Modulation Transfer Function (MTF) quantifies how well a room preserves amplitude modulations at each frequency. For a given octave band k and modulation frequency F, the modulation transfer index m(F,k) is defined as:

m(F,k) = 1 / √(1 + (2πF × T(k) / 13.8)²) × 1 / (1 + 10^(-SNR(k)/10))

where T(k) is the reverberation time at octave band k, F is the modulation frequency (Hz), and SNR(k) is the signal-to-noise ratio at octave band k in decibels. The first term captures the reverberant degradation; the second captures the noise degradation. Both terms range from 0 to 1, and their product gives the combined modulation preservation.

The 98-Point MTF Matrix

IEC 60268-16:2020 §4.2 specifies 14 modulation frequencies: 0.63, 0.80, 1.00, 1.25, 1.60, 2.00, 2.50, 3.15, 4.00, 5.00, 6.30, 8.00, 10.0, and 12.50 Hz. These span the range of temporal modulations relevant to speech.

These are evaluated across 7 octave bands: 125, 250, 500, 1000, 2000, 4000, and 8000 Hz. The result is a 14 × 7 matrix of 98 modulation transfer indices, each ranging from 0 (complete modulation loss) to 1 (perfect modulation preservation).

From MTF to STI

The 98 MTF values are processed through five steps to produce the final STI score:

Step 1: Limit the apparent SNR. Each m(F,k) value is converted to an apparent signal-to-noise ratio: SNR_app = 10 × log₁₀(m / (1-m)). This is clipped to the range -15 dB to +15 dB.

Step 2: Average across modulation frequencies. For each octave band k, the 14 apparent SNR values are arithmetically averaged to produce a single Modulation Transfer Index (MTI) per band: MTI(k) = (1/14) × Σ SNR_app(F,k).

Step 3: Normalise to 0–1 range. Each MTI is normalised: TI(k) = (MTI(k) + 15) / 30.

Step 4: Apply octave band weighting. The seven TI values are combined using octave band weighting factors specified in IEC 60268-16 Table 3. The 2020 edition uses redundancy-corrected weights that account for the statistical correlation between adjacent octave bands:

Octave Band (Hz)1252505001000200040008000
Male weight (αk)0.0850.1270.2300.2330.3090.2240.173
Male weight (βk)0.0850.0780.0650.0110.0470.095
Female weight (αk)0.0000.1170.2230.2160.3280.2500.194
Female weight (βk)0.0000.0990.0660.0620.0250.076

Step 5: Calculate final STI. STI = Σ(αk × TI(k)) − Σ(βk × √(TI(k) × TI(k+1)))

The subtracted term accounts for redundancy between adjacent bands — the information carried in the 500 Hz band is partially shared with the 1000 Hz band, so simply summing the weighted TI values would double-count some intelligibility.

Part 2: STIPA — The Simplified Field Method

Full STI measurement requires modulating a test signal at each of the 14 modulation frequencies individually in each of the 7 octave bands — a procedure that takes several minutes per source-receiver position. STIPA (Speech Transmission Index for Public Address) was developed as a practical field method that achieves comparable accuracy in approximately 15 seconds.

How STIPA Works

STIPA uses a specially designed test signal (defined in IEC 60268-16 Annex D) that simultaneously contains two modulation frequencies per octave band, for a total of 14 modulation-band pairs:

Octave Band (Hz)1252505001000200040008000
Modulation 1 (Hz)1.000.632.001.250.806.3012.50
Modulation 2 (Hz)5.003.1510.008.004.002.501.60

The signal is played through a loudspeaker at the source position. At the receiver position, a STIPA analyser extracts the modulation depth at each of the 14 frequencies and calculates STI using the same weighting formula as full STI. Agreement between STIPA and full STI is typically within ±0.03 (IEC 60268-16 Annex F).

RASTI — The Withdrawn Predecessor

RASTI (Rapid Speech Transmission Index) was the original simplified method, using only two octave bands (500 Hz and 2000 Hz) with a total of 9 modulation frequencies. It was defined in the 1988 and 1998 editions of IEC 60268-16 but was withdrawn from the 2003 and subsequent editions because it did not adequately capture low-frequency and high-frequency effects on intelligibility. RASTI measurements from existing acoustic surveys should be treated as approximate. They cannot be directly compared with STIPA or full STI values.

Part 3: The STI Quality Scale

IEC 60268-16:2020 §4.6 defines the following quality categories. The CIS (Common Intelligibility Scale) mapping provides an approximate correspondence between STI and the percentage of words correctly identified in standardised articulation testing (modified rhyme test per ISO 8253-3).

STI RangeIEC QualityCIS ScoreApprox. Word IntelligibilityApprox. Sentence Intelligibility
0.00–0.30Bad0.00–0.31< 35%< 50%
0.30–0.45Poor0.31–0.5235–55%50–75%
0.45–0.60Fair0.52–0.7255–75%75–90%
0.60–0.75Good0.72–0.8975–92%90–97%
0.75–1.00Excellent0.89–1.00> 92%> 97%

Critical threshold: The transition from "Fair" to "Good" at STI = 0.60 is the most significant boundary in practice. Below 0.60, listeners expend measurable cognitive effort to decode speech, leading to fatigue, errors, and reduced learning outcomes (Crandell and Smaldino, 2000). Above 0.60, speech processing becomes largely automatic. This is why DIN 18041:2016 §4.4, the ASHA classroom recommendation, and most modern design guides use 0.60 as the minimum target for speech communication rooms.

Part 4: Factors Affecting STI

Factor 1: Reverberation Time

RT60 is the dominant factor in most rooms. The relationship is approximately:

STI ≈ 0.45 × (-log₁₀(RT60)) + 0.59 (for SNR > 25 dB, mid-frequency estimate)

This yields STI ≈ 0.72 at RT60 = 0.5 s, STI ≈ 0.59 at RT60 = 1.0 s, and STI ≈ 0.46 at RT60 = 2.0 s. The approximation shows that halving the RT60 improves STI by approximately 0.13 — a meaningful but not transformative improvement.

Factor 2: Signal-to-Noise Ratio

Background noise (from HVAC, traffic, adjacent rooms) directly reduces STI by masking the quiet portions of the speech modulation. The effect is:

  • SNR > 25 dB: negligible impact on STI (noise term in MTF ≈ 1.0)
  • SNR 15 dB: STI reduced by approximately 0.05–0.10
  • SNR 10 dB: STI reduced by approximately 0.15–0.25
  • SNR 0 dB: STI reduced by approximately 0.35–0.45
For a typical teacher speaking at 60 dBA at 1 m, and background noise of 45 dBA (ANSI S12.60 limit = 35 dBA), the SNR at 4 m distance is approximately: 60 − 10log₁₀(4²) − 45 = 60 − 12 − 45 = 3 dB. At this SNR, STI is severely compromised regardless of the room's reverberation time.

Factor 3: Early Reflections

Early reflections (arriving within 50 ms of the direct sound) can enhance STI because they reinforce the direct signal without smearing temporal modulation. A well-designed room with strong early lateral reflections and controlled late reverberation can achieve higher STI than a room with the same RT60 but weak early reflections. This effect is captured by the detailed MTF calculation but is obscured by the simplified RT60-to-STI approximation.

Factor 4: Source Distance

STI decreases with source-receiver distance. In a diffuse field, the critical distance (where direct and reverberant energy are equal) is:

r_c = 0.057 × √(V / (π × RT60))

Beyond r_c, the reverberant field dominates and STI is determined primarily by RT60 and background noise. For a 180 m³ classroom with RT60 = 0.6 s, r_c ≈ 0.057 × √(180 / (π × 0.6)) ≈ 0.57 m — meaning students beyond approximately 1 m from the teacher are in the reverberant field.

Factor 5: Electroacoustic Systems

Sound reinforcement systems can improve STI by increasing the direct-to-reverberant ratio at listener positions. However, poorly designed PA systems can reduce STI if they: (a) increase late reverberant energy through loudspeaker reflections, (b) introduce time delays between loudspeakers that produce comb-filtering effects, or (c) produce frequency response anomalies that distort speech spectra. IEC 60268-16 §4.5 includes provisions for measuring STI through electroacoustic systems.

Part 5: STI Targets by Room Type

Room TypeSTI TargetSource StandardNotes
Classroom (standard)≥ 0.60DIN 18041 §4.4BB93 implies ≥ 0.60 via RT60/BNL limits
Classroom (hearing-impaired)≥ 0.70ASHA recommendationDIN 18041 Group A+ implies ≥ 0.65
Lecture hall≥ 0.60DIN 18041 §4.4With sound reinforcement if V > 500 m³
Courtroom≥ 0.65BS 8233 (guidance)Critical for legal proceedings
Meeting room≥ 0.60DIN 18041, WELL v2 F74Measured at furthest seat
Open-plan office (rD target)STI < 0.50 at rDISO 3382-3Goal is privacy, not intelligibility
PA system (transport hub)≥ 0.50IEC 60268-16 §4.6Life safety announcements
PA system (railway platform)≥ 0.45EN 60849 (now EN 54-16)Challenging due to open-air conditions
Emergency voice alarm≥ 0.50BS 5839-8UK fire alarm standard
Cinema≥ 0.55SMPTE ST 202Dialogue intelligibility

Part 6: Worked Example — STI Prediction for a Classroom

Room: Primary school classroom, 9 m × 7 m × 3.0 m (V = 189 m³)

Step 1: Determine RT60 at each octave band.

After treatment with a Class A acoustic ceiling (25 mm mineral wool, αw 0.90) and 10 m² of wall panels, predicted RT60 values are:

Octave Band (Hz)125250500100020004000
RT60 (s)0.850.620.480.450.420.40

Step 2: Determine SNR at each octave band.

Teacher speech level at 1 m (typical male, per IEC 60268-16 Table E.1): 53, 56, 62, 58, 53, 49 dBA per octave band. Background noise (HVAC at NC 30): 42, 36, 31, 28, 25, 22 dBA. At 4 m distance, direct sound attenuates by 12 dB. Reverberant field addition depends on room constant.

Approximate SNR at 4 m (including reverberant contribution): 5, 12, 22, 20, 17, 16 dB.

Step 3: Calculate MTF at a representative modulation frequency (2 Hz) for each octave band.

Using the MTF formula: m(2, k) = 1/√(1 + (2π × 2 × T(k)/13.8)²) × 1/(1 + 10^(-SNR(k)/10))

Band (Hz)125250500100020004000
Reverb term0.740.840.900.920.930.94
Noise term0.760.940.990.990.980.98
m(2, k)0.560.790.890.910.910.92

Step 4: Repeat for all 14 modulation frequencies, average, weight, and combine.

The full calculation (performed computationally) yields: STI = 0.64 — within the "Good" category and above the 0.60 threshold required by DIN 18041 §4.4. The 125 Hz band is the weakest link (RT60 = 0.85 s and low SNR reduce that band's contribution significantly). Additional low-frequency absorption would improve STI by addressing this weakness.

Part 7: Measurement Equipment

Dedicated STIPA Analysers

InstrumentManufacturerSTIPA AccuracyAdditional FeaturesApprox. Cost
XL2 Sound Level MeterNTi Audio±0.02Class 1 SLM, RT60, spectrum£5,000–7,000
AM100Bedrock Audio±0.02Dedicated STI/STIPA£2,500–3,500
Type 2270Brüel & Kjær±0.02Class 1 SLM, building acoustics suite£12,000–18,000
Digicheck by Gold LineGold Line±0.03Budget STIPA£1,500–2,000

STIPA Signal Source

The STIPA test signal must conform to IEC 60268-16 Annex D. Pre-recorded WAV files are available from NTi Audio and Bedrock. The signal must be played through a loudspeaker that does not distort the modulation (THD < 3%), positioned at the typical talker height (1.5 m standing, 1.2 m seated) and oriented toward the primary listener area. The signal level should be calibrated to 60 dBA at 1 m (normal male speech level per IEC 60268-16 Table E.1).

Common Measurement Errors

  1. Source level too high or too low. If the source exceeds 70 dBA at 1 m, the SNR is artificially high and the measured STI will overestimate real-speech performance. If below 55 dBA, the signal may not overcome background noise.
  1. Measurement time too short. STIPA requires a minimum 15-second integration time for stable results. Shorter measurements produce statistical scatter of ±0.05.
  1. Source not at talker position. Placing the loudspeaker on a table (0.8 m) instead of at head height (1.5 m) changes the early reflection pattern and can shift STI by ±0.03.
  1. Background noise not representative. Measuring with HVAC off or with the room empty when the standard requires occupied/operational conditions.

Related Reading:

Predict STI for your room design. Try the AcousPlan calculator — enter your room dimensions and surface materials to get RT60 predictions and estimated STI scores with treatment recommendations.

Related Articles

Run This Analysis Yourself

AcousPlan calculates RT60, STI, and compliance using the same standards referenced in this article. Free tier available.

Start Designing Free