AcousPlan™ — IEC 60268-16 Complete Guide: Speech Transmission Index (STI) Standard

The Standard That Defines Speech Intelligibility

IEC 60268-16 is the international standard that defines how to quantify speech intelligibility in any acoustic environment. Published and maintained by IEC Technical Committee TC 100 (Audio, video and multimedia systems and equipment), it provides both the full calculation method — the Speech Transmission Index (STI) — and the practical field measurement method — STIPA. If your project requires speech intelligibility verification for building code compliance, voice alarm certification, or acoustic design validation, IEC 60268-16 is the standard your acoustic consultant uses.

The standard answers a deceptively simple question: can a listener at a given position in a room understand what a speaker is saying? Reverberation time alone cannot answer that question. Background noise level alone cannot answer it either. STI captures both degradation mechanisms — reverberation and noise — in a single number between 0 and 1 that correlates directly with word recognition scores measured in controlled listening tests.

This guide covers every aspect of the standard that practitioners need: the edition history, the underlying physics, the full calculation procedure, the STIPA measurement method, the quality scale, how building codes worldwide reference the standard, and the most common errors that lead to incorrect STI values.

Standard History and Editions

The development of STI as a metric predates the standard itself. Houtgast and Steeneken at TNO (Netherlands Organisation for Applied Scientific Research) published the foundational work on the Modulation Transfer Function approach to speech intelligibility in 1973. Their research demonstrated that speech perception could be predicted from how well a transmission channel preserves the temporal modulations of the speech signal — without needing to transmit actual speech.

IEC adopted this work into a formal standard through the following editions:

First edition (1988): Established the STI calculation method based on the Modulation Transfer Function. Defined the 7-octave-band, 14-modulation-frequency matrix. This edition laid the groundwork but lacked a standardised field measurement procedure.

Second edition (1998): Introduced the RASTI (Rapid Speech Transmission Index) method — a simplified two-band measurement using only the 500 Hz and 2000 Hz octave bands. RASTI was widely adopted for field measurements but had significant accuracy limitations in rooms with non-flat frequency responses.

Third edition (2003): Replaced RASTI with STIPA (Speech Transmission Index for Public Address systems). STIPA uses all 7 octave bands but with a reduced set of modulation frequencies, making it far more robust than RASTI while remaining practical for field measurement. This edition formally deprecated RASTI.

Fourth edition (2011): Refined the weighting factors and updated the auditory masking model. Added clarity on the treatment of non-linear distortion in sound reinforcement systems.

Fifth edition (2020): The current edition. Updated the male and female speech spectrum levels to reflect contemporary research. Revised the redundancy-corrected weighting factors. Strengthened the STIPA methodology and extended guidance on measurement uncertainty. This is the edition that all current acoustic work should reference.

Each edition refined the metric while maintaining backward compatibility with the fundamental MTF-based approach. An STI value of 0.60 means essentially the same thing across all editions, though the precise numerical result for a given room may differ slightly due to updated weighting factors.

What STI Measures: The Modulation Transfer Function

To understand STI, you must first understand what makes speech intelligible. Speech is not a steady-state signal. It is an amplitude-modulated signal — the sound pressure level fluctuates rapidly as the speaker produces syllables, consonants, and vowels. These rapid fluctuations, called temporal modulations, carry the information content of speech. The modulation frequencies that matter for intelligibility range from approximately 0.5 Hz (the rate of syllable production) to approximately 12.5 Hz (the rate of individual phoneme transitions).

When speech travels through a room, two things degrade these modulations:

Reverberation smears the peaks. Each syllable produces a burst of sound energy. In a reverberant room, that energy persists and overlaps with the next syllable. The result is that the modulation peaks are reduced — the contrast between loud syllables and quiet gaps is diminished. The longer the reverberation time, the greater the smearing.

Background noise fills the troughs. Between syllables, the speech signal drops to a low level. In a quiet room, this drop is clearly audible. In a noisy room, background noise fills in the quiet gaps between syllables. The modulation troughs are raised — again reducing the contrast.

The Modulation Transfer Function (MTF) quantifies exactly how much modulation contrast survives. For a given octave band and modulation frequency, the MTF value m(F, f_mod) ranges from 0 (no modulation contrast preserved — the output is essentially steady-state noise) to 1 (perfect modulation contrast preserved — the temporal pattern is transmitted without degradation).

STI is, at its core, a weighted average of 98 MTF values, converted through a series of transformations into a single number that predicts intelligibility.

The Full Calculation Method (Section 4)

The STI calculation in IEC 60268-16:2020 Section 4 proceeds through the following steps. This is the complete method — not a simplification.

Step 1: Define the Frequency and Modulation Grid

The calculation operates on a matrix of 7 octave bands and 14 modulation frequencies:

Octave band centre frequencies: 125, 250, 500, 1000, 2000, 4000, 8000 Hz

Modulation frequencies: 0.63, 0.80, 1.00, 1.25, 1.60, 2.00, 2.50, 3.15, 4.00, 5.00, 6.30, 8.00, 10.0, 12.5 Hz

This creates a 7 x 14 matrix of 98 modulation transfer values. Each cell in the matrix represents how well the room preserves a specific modulation rate within a specific frequency band.

Step 2: Calculate the MTF for Each Cell

For each octave band k and modulation frequency F, the modulation transfer value is calculated from two contributions:

Reverberation reduction factor:

m_rev(F) = 1 / sqrt(1 + (2 pi F * T)^2)

where T is the reverberation time (in seconds) in octave band k. This formula comes directly from the statistical theory of room acoustics — it describes how a reverberant field reduces modulation depth as a function of both the modulation frequency and the decay time.

Signal-to-noise ratio contribution:

m_noise = 1 / (1 + 10^(-SNR/10))

where SNR is the signal-to-noise ratio (in dB) in octave band k, calculated from the speech level and the background noise level at the receiver position.

Combined MTF value:

m(k, F) = m_rev(F) * m_noise

The combined MTF assumes that reverberation and noise act as independent degradation mechanisms, which is a reasonable approximation in most practical situations.

Step 3: Apply Auditory Masking Corrections

The 2020 edition includes an auditory masking model that accounts for the upward spread of masking — low-frequency energy partially masks higher-frequency content. For each octave band above 125 Hz, the effective noise level is adjusted to include a contribution from the band below, weighted by the masking factor. This correction is particularly important in rooms with strong low-frequency noise (e.g., HVAC rumble).

Step 4: Convert MTF to Apparent Signal-to-Noise Ratio

Each MTF value is converted to an apparent signal-to-noise ratio:

SNR_app(k, F) = 10 * log10(m(k, F) / (1 - m(k, F)))

This transformation maps the 0-to-1 MTF range onto a dB scale. The apparent SNR is then clipped to the range -15 dB to +15 dB, because intelligibility does not improve above approximately +15 dB SNR, and it does not get worse below approximately -15 dB SNR.

Step 5: Average Across Modulation Frequencies

For each octave band k, the 14 apparent SNR values are arithmetically averaged to produce a single Modulation Transfer Index (MTI) per band:

MTI(k) = (1/14) * SUM[SNR_app(k, F)] for all 14 modulation frequencies

Step 6: Apply Frequency Weighting and Calculate STI

The seven MTI values are combined using octave-band weighting factors to produce the final STI. The 2020 edition specifies separate weighting factors for male and female speech:

Male speech weighting factors (alpha):

Band (Hz)	125	250	500	1000	2000	4000	8000
alpha	0.085	0.127	0.230	0.233	0.309	0.224	0.173
beta	0.085	0.078	0.065	0.011	0.047	0.095	—

Female speech weighting factors (alpha):

Band (Hz)	125	250	500	1000	2000	4000	8000
alpha	0.117	0.223	0.216	0.328	0.250	0.194	0.053
beta	0.099	0.066	0.062	0.025	0.076	0.028	—

The final STI is:

STI = SUM[alpha(k) MTI(k)] - SUM[beta(k) sqrt(MTI(k) * MTI(k+1))]

The alpha terms are the direct weighting. The beta terms are redundancy correction factors that account for the correlation between adjacent octave bands — information carried in one band is partially redundant with information in the adjacent band, so the raw weighted sum would overestimate intelligibility without this correction.

The result is a single number between 0 and 1.

The STI Quality Scale (Section 6)

IEC 60268-16 defines five quality categories based on the calculated or measured STI value. These categories correlate with word and sentence recognition scores obtained from controlled listening tests with panels of listeners.

STI Range	Quality Category	Word Recognition	Typical Applications
< 0.30	Bad	Below 60%	Essentially unintelligible. Unacceptable for any application requiring speech communication.
0.30 - 0.45	Poor	60-75%	Background music venues, some industrial spaces. Speech can be partially understood with effort and context.
0.45 - 0.60	Fair	75-90%	Large concourses, railway stations, corridors. Speech is generally understandable but listeners must concentrate. Not acceptable for classrooms or meeting rooms.
0.60 - 0.75	Good	90-96%	Classrooms, offices, meeting rooms, worship spaces. Speech is clearly understood with minimal listener effort. This is the target range for most occupied spaces.
0.75 - 1.00	Excellent	96-100%	Lecture halls, courtrooms, recording studios, critical listening environments. Near-perfect intelligibility.

Two aspects of this scale deserve emphasis. First, the relationship between STI and word recognition is non-linear. The difference between STI 0.45 and STI 0.60 represents a jump from 75% to 90% word recognition — a 15 percentage point improvement from a 0.15 increase in STI. Second, sentence recognition scores are higher than word recognition scores at any given STI, because sentence context provides redundancy. A listener can correctly identify a sentence even if individual words within it are not fully perceived. Building codes that specify STI requirements are implicitly targeting word-level intelligibility, which is the more conservative (and appropriate) criterion.

STIPA: The Practical Measurement Method (Section 5)

The full STI calculation requires knowledge of the room's reverberation time across all seven octave bands and the signal-to-noise ratio at the measurement position. In existing buildings, these values can be calculated from measured impulse responses — but this is a complex, time-consuming process that requires specialised equipment and expertise.

STIPA (Speech Transmission Index for Public Address systems) was developed as a direct measurement method that produces an STI-equivalent result without requiring separate RT60 and noise measurements.

How STIPA Works

The STIPA method uses a specially designed test signal — a broadband noise signal that is simultaneously amplitude-modulated at specific modulation frequencies in each octave band. The signal contains modulations in all 7 octave bands, with 2 modulation frequencies per band (14 total), carefully chosen to avoid overlap between bands.

The measurement procedure is:

Generate the STIPA test signal using a calibrated loudspeaker positioned at the talker location. The signal level is set to represent a typical speech level (60-65 dBA at 1 metre).

Record the received signal using a STIPA-capable sound level meter at the listener position. The measurement duration is a minimum of 15 seconds, though the standard recommends 20-30 seconds for improved statistical reliability.

The meter analyses the received signal, extracting the modulation depth in each octave band at each modulation frequency. The ratio of received modulation depth to transmitted modulation depth gives the MTF value for each band-modulation combination.

The meter calculates STIPA from the extracted MTF values using the same weighting and averaging procedure as the full STI calculation (Steps 4-6 above).

STIPA Accuracy

The standard states that STIPA correlates with the full STI to within plus or minus 0.03 under typical conditions. This means that a STIPA measurement of 0.58 corresponds to a true STI between 0.55 and 0.61. For most building code compliance purposes, this accuracy is more than sufficient.

However, STIPA accuracy degrades in certain conditions:

Highly non-diffuse sound fields (e.g., strongly directional loudspeakers in a very absorptive room) can produce STIPA values that deviate from the full STI by more than 0.03.
Non-linear distortion in sound reinforcement systems can inflate STIPA readings because the distortion products add energy at the modulation frequencies. The 2020 edition includes guidance on identifying and correcting for this effect.
Impulsive noise during the measurement (e.g., door slams, dropped objects) will corrupt the result. The standard recommends rejecting measurements with impulsive contamination.

Equipment Requirements

STIPA measurements require a Class 1 or Class 2 sound level meter with STIPA analysis capability, and a calibrated STIPA signal source. Major manufacturers of STIPA-capable meters include NTi Audio, Bedrock, and Bruel & Kjaer. The test signal is standardised — any compliant signal source will produce the same result with any compliant analyser.

How Building Codes Reference IEC 60268-16

IEC 60268-16 is not a building code. It does not specify what the STI should be in any particular type of room. It defines how to calculate and measure STI. Building codes, certification schemes, and performance standards then reference it to establish specific requirements.

Code / Standard	STI Requirement	Application	Reference Clause
WELL v2 Feature 74	STI < 0.50 between workstations	Speech privacy in open plan offices	Part 3 (Sound Concept S05)
ANSI S12.60-2010	STI >= 0.60 (unoccupied)	Classrooms in the United States	Section 5
BB93:2015	STI referenced for verification	School buildings in the United Kingdom	Section 1.4
EN 50849:2017	STI >= 0.50	Voice alarm and emergency sound systems	Section 5.3
NFPA 72:2022	STI >= 0.50	Emergency communications systems (US)	Section 18.4.10
DIN 18041:2016	STI >= 0.60 for category A rooms	Rooms for speech communication (Germany)	Section 5
AS 2107:2016	STI referenced for design guidance	Recommended noise levels in buildings (Australia)	Appendix A
ISO 22201-1	STI >= 0.50	Electroacoustics — sound systems for emergency purposes	Section 5.2

Notice the split in how codes use STI. Speech intelligibility codes (ANSI S12.60, DIN 18041) require STI above a threshold — the goal is to ensure that listeners can understand the speaker. Speech privacy codes (WELL v2 Part 3) require STI below a threshold — the goal is to ensure that listeners at adjacent workstations cannot understand private conversations. These are opposite design objectives, and a room optimised for one will typically fail the other.

The voice alarm standards (EN 50849, NFPA 72, ISO 22201-1) all converge on STI >= 0.50 as the minimum for life-safety communication. This threshold represents the boundary between "fair" and "poor" intelligibility — the minimum level at which short, pre-recorded emergency announcements can be understood reliably by a general population, including elderly listeners and non-native speakers.

Factors That Affect STI

Understanding the sensitivity of STI to different parameters is essential for acoustic design. Here are the primary factors and their approximate impact.

Reverberation Time

RT60 is the dominant factor in most rooms. The relationship is approximately:

Every 0.1 second increase in RT60 reduces STI by approximately 0.03 (in the range RT60 = 0.5 to 2.0 seconds, with background noise below 35 dBA).

For a typical classroom with RT60 = 0.6 s and low background noise, STI is approximately 0.72. Increasing RT60 to 1.0 s drops STI to approximately 0.60. Increasing it to 1.5 s drops STI to approximately 0.48 — from "good" to "fair" intelligibility. This is why acoustic ceiling tiles (which primarily reduce RT60) are the single most effective intervention for improving speech intelligibility in most rooms.

Background Noise

Noise is the second most important factor. The impact depends on the speech level and the spectral shape of the noise:

Every 5 dB increase in broadband background noise reduces STI by approximately 0.05 (when the SNR is in the range 5 to 20 dB).

Low-frequency noise (e.g., HVAC rumble, traffic) is particularly damaging because it masks the 250-500 Hz octave bands where fundamental speech energy is concentrated. A room with RT60 = 0.6 s and background noise of 35 dBA might achieve STI = 0.70. The same room with 45 dBA background noise drops to STI = 0.58.

Source-Receiver Distance

In a diffuse sound field, the direct-to-reverberant ratio decreases with distance from the source. Beyond the critical distance (the distance at which direct and reverberant energy are equal), STI decreases with increasing distance. In a typical untreated classroom, the critical distance is approximately 1.5-2.0 metres. Students sitting beyond this distance receive more reverberant energy than direct energy, and their STI is lower than students in the front row.

Sound Reinforcement Systems

PA and sound reinforcement systems can either improve or degrade STI, depending on design quality:

A well-designed distributed loudspeaker system can extend the critical distance and maintain high STI throughout a large space (e.g., a lecture hall or concourse).
A poorly designed system — with excessive reverberation from delayed reflections, incorrect time alignment between distributed speakers, or non-linear distortion — can reduce STI below what would be achieved with unamplified speech.

The interaction between sound system design and room acoustics is one of the most challenging aspects of achieving STI compliance in large spaces. The system must be designed with knowledge of the room's acoustic properties, and IEC 60268-16 is the standard used to verify the combined performance.

Common Measurement Errors

STIPA is a robust measurement method, but several common errors can produce incorrect results. Awareness of these pitfalls is essential for anyone commissioning or conducting STI measurements.

Using a Non-Compliant Test Signal

The STIPA test signal must conform precisely to the specification in IEC 60268-16. Using white noise, pink noise, or any signal other than the standardised STIPA signal will produce meaningless results. The signal must contain the correct modulation frequencies at the correct depths in each octave band. Always verify that your signal source is from the meter manufacturer and conforms to the current edition of the standard.

Measuring During Transient Noise

Construction noise, passing aircraft, slamming doors, and other transient sounds will corrupt STIPA measurements. The standard requires that the acoustic environment during measurement be representative of the intended operating condition. Measurements during construction are not valid. Measurements should be taken with the HVAC system running at its normal operating point, with typical occupancy (or unoccupied, depending on the building code requirement), and without extraneous noise sources.

Insufficient Measurement Positions

A single STIPA measurement characterises a single source-receiver path. STI varies significantly with position in most rooms — front-row seats in a classroom may have STI = 0.75 while rear seats have STI = 0.55. Building codes typically require measurements at multiple positions representing the range of listener locations. ANSI S12.60, for example, requires measurements at a minimum of 3 positions representing the most and least favourable conditions.

Incorrect Source Level

The STIPA test signal must be reproduced at a level representative of the speech source it is simulating. For unamplified speech, this is typically 60-65 dBA at 1 metre. Setting the source level too high will produce optimistic STI values because the SNR at all measurement positions will be artificially elevated. Setting it too low will produce pessimistic values.

Ignoring the Octave-Band Detail

A STIPA meter reports a single STI number, but most also provide the per-band MTI values. Examining these values is essential for diagnosis. A room might achieve an overall STI of 0.58 — apparently "fair" — but the per-band analysis might reveal that the 250 Hz band has an MTI of 0.30 while all other bands are above 0.65. This pattern indicates a specific problem (low-frequency noise or excessive low-frequency reverberation) that a targeted intervention could address. Reporting only the single-number STI misses this diagnostic information.

STI vs. Other Intelligibility Metrics

STI is not the only speech intelligibility metric, but it is the most widely accepted and the only one with full international standardisation. Other metrics include:

RASTI (Rapid Speech Transmission Index): Deprecated since the 2003 edition. Used only 2 octave bands. Still encountered in older literature and some legacy measurement equipment, but should not be used for new work.

%ALcons (Percentage Articulation Loss of Consonants): An older metric developed by Peutz, calculated from RT60 and distance. Useful for quick estimates but does not account for background noise spectrum. A conversion formula exists: %ALcons is approximately 170 * (1 - STI)^2.5, valid for STI > 0.30.

CIS (Common Intelligibility Scale): A linear transformation of STI designed to make the scale more intuitive. CIS = 1 + log10(STI). Rarely used in practice.

U50: The ratio of useful-to-detrimental energy, with 50 ms as the boundary. Related to C50 (clarity) but not directly equivalent to STI. Used in some German standards.

For any new acoustic design or compliance verification, STI per IEC 60268-16 is the appropriate metric. All major building codes that specify speech intelligibility requirements reference this standard.

AcousPlan and IEC 60268-16

AcousPlan calculates STI directly from the inputs you provide — room dimensions, surface materials (which determine RT60 across all octave bands), and background noise levels — using the IEC 60268-16:2020 methodology. The calculation implements the full 7-band, 14-modulation-frequency MTF matrix, the auditory masking correction, the apparent SNR conversion with clipping, and the redundancy-corrected frequency weighting.

When you run a simulation in AcousPlan, the results dashboard shows your STI alongside the quality category and the per-band Modulation Transfer Index values. If your STI falls below the target for your room type, the AI prescription engine recommends specific material changes and identifies which octave bands are limiting intelligibility — so you know exactly where to focus your treatment.

You do not need to own a STIPA meter or understand the mathematics of modulation transfer functions. You need to know what your room's STI is before it is built, so you can fix problems on paper rather than on site.

Calculate your room's STI with AcousPlan — enter your room dimensions, select surface materials, set your background noise level, and get an IEC 60268-16 compliant Speech Transmission Index in seconds.

IEC 60268-16 Complete Guide: Speech Transmission Index (STI) Standard