TUTORIALS19 min read

What Is STI (Speech Transmission Index) — Can People Actually Understand Speech in Your Room?

STI measures how much a room degrades speech from source to listener, on a scale from 0 (unintelligible) to 1 (perfect). An STI below 0.50 means one in four words is lost. Here is how STI works, what scores you need, and why reverberation time alone is not enough.

AcousPlan Editorial · March 14, 2026

Think of a perfectly tuned FM radio station. The DJ's voice comes through crisp, every consonant sharp, every word distinct. That is what a Speech Transmission Index of 1.0 sounds like. Now imagine driving out of range. Static creeps in. The signal fades. You catch fragments — "traffic on the..." — but the rest dissolves into noise. By the time the signal is half static, you are guessing at every other sentence.

That is exactly what happens inside buildings. Except the static is not radio interference. It is reverberation, background noise from HVAC systems, and the sheer distance between the person speaking and the person trying to listen. The room itself becomes the interference. STI measures how much interference a room adds between the talker's mouth and the listener's ear — and it does so on a single, elegant scale from 0.00 to 1.00.

The Formal Definition

The Speech Transmission Index is defined in IEC 60268-16:2020, the international standard for the objective rating of speech intelligibility. The standard describes STI as a measure of how well a transmission channel — in our case, a room — preserves the temporal envelope of speech.

What does "temporal envelope" mean in plain language? When you speak, the volume of your voice is constantly changing. Vowels are louder. Consonants are quieter and shorter. The rapid fluctuations between loud and quiet — dozens of times per second — are what carry meaning. Your brain decodes these fluctuations into words. If the room smears those fluctuations (through reverberation) or buries them (through background noise), your brain receives a degraded signal. You hear sound, but you do not hear words.

STI quantifies exactly how much degradation occurs. A value of 1.00 means the room transmits speech perfectly — every modulation arrives intact. A value of 0.00 means total destruction — the listener hears noise with no discernible speech content. Real rooms fall somewhere in between, and the differences between 0.45 and 0.65 are the differences between a room where people constantly ask "can you repeat that?" and a room where communication flows effortlessly.

The STI Quality Scale

IEC 60268-16 provides a five-level quality scale that maps STI values to subjective intelligibility. This table is the single most referenced element of the standard, and for good reason: it translates an abstract number into something a building owner can understand.

STI RangeIntelligibility RatingApproximate % Sentences UnderstoodTypical Application
0.00 – 0.30BadLess than 35%Avoid — no legitimate use case for occupied spaces
0.30 – 0.45Poor35% – 65%Background music venues only
0.45 – 0.60Fair65% – 85%Restaurants, transit corridors, retail
0.60 – 0.75Good85% – 95%Classrooms, meeting rooms, courtrooms
0.75 – 1.00ExcellentGreater than 95%Lecture halls, command centres, hearing-impaired facilities

A few things stand out from this table. First, the "Fair" category — STI between 0.45 and 0.60 — is where most untreated commercial interiors land. An open-plan office with a plasterboard ceiling, no acoustic treatment, and a 38 dBA HVAC system will typically score around 0.50 to 0.55. People can communicate, but they strain to do so. Fatigue accumulates over a working day. Errors increase. This is the acoustic equivalent of reading a document printed in light grey ink — technically possible, but unnecessarily hard.

Second, the jump from "Fair" to "Good" at 0.60 is not a subtle improvement. Research consistently shows that listener satisfaction, task performance, and comprehension all improve sharply once STI crosses 0.60. This threshold appears so frequently in building standards that it has become a de facto minimum for spaces where speech communication is the primary function.

Third, achieving "Excellent" (above 0.75) is difficult and expensive. It requires very short reverberation times, very low background noise, and short distances between speaker and listener. Purpose-built lecture theatres and courtrooms can reach this range. Most ordinary rooms cannot, and the cost of the last 0.10 of STI improvement is disproportionately high compared to the first 0.10.

How STI Is Calculated

The full STI calculation in IEC 60268-16 is mathematically involved, but the logic is elegant once you see the structure. Here is a simplified walkthrough that captures the essential mechanism without requiring signal processing expertise.

Step 1: Define the Modulation Frequencies

Speech carries information through amplitude modulations — the rising and falling volume that distinguishes one syllable from the next. IEC 60268-16 tests how well a room preserves modulations at 14 specific frequencies: 0.63, 0.80, 1.00, 1.25, 1.60, 2.00, 2.50, 3.15, 4.00, 5.00, 6.30, 8.00, 10.00, and 12.50 Hz.

Why these frequencies? Because they correspond to the rate at which phonemes, syllables, and words occur in natural speech. Slow modulations (0.63 Hz) represent sentence-level rhythm. Fast modulations (12.5 Hz) represent individual consonant transitions. A room that preserves fast modulations well will transmit crisp, intelligible speech. A room that smears fast modulations will sound "muddy" — you hear the voice, but the words blur together.

Step 2: Test Across 7 Octave Bands

The 14 modulation frequencies are tested across 7 octave bands centred at 125, 250, 500, 1000, 2000, 4000, and 8000 Hz. This matters because different octave bands carry different speech information. The 2000 Hz and 4000 Hz bands carry most of the consonant energy that distinguishes "b" from "d" and "s" from "f." The 500 Hz and 1000 Hz bands carry vowel energy. A room might preserve modulations well at 500 Hz (where the ceiling tiles are effective) but poorly at 4000 Hz (where the hard glass walls reflect aggressively). The multi-band approach captures this uneven behaviour.

The result is a grid of 98 values (14 modulation frequencies times 7 octave bands). Each value represents the modulation transfer function — the ratio of output modulation depth to input modulation depth — for one combination.

Step 3: Apply the Modulation Transfer Function

For each cell in the grid, the modulation transfer function (MTF) is calculated. In a simplified model, the MTF depends on two things:

Reverberation reduces the MTF because reflected sound arrives late and fills in the gaps between loud and quiet passages. The longer the reverberation time, the more the modulations are smoothed out. Mathematically, the reverberation contribution to MTF reduction is a function of the modulation frequency and the reverberation time in that octave band.

Background noise reduces the MTF because it adds a constant level of sound that partially masks the quieter portions of speech. The higher the background noise relative to the speech level, the more the modulations are masked. The noise contribution is a function of the signal-to-noise ratio in each octave band.

The combined MTF for each cell ranges from 0.0 (complete destruction of the modulation) to 1.0 (perfect preservation).

Step 4: Convert to Apparent Signal-to-Noise Ratios

Each MTF value is converted to an apparent signal-to-noise ratio using the formula:

SNR_apparent = 10 log10 (MTF / (1 - MTF))

This conversion maps the 0-to-1 MTF scale onto a decibel scale, which is then clipped to the range -15 dB to +15 dB. The clipping reflects the psychoacoustic reality that signal-to-noise ratios below -15 dB contribute no intelligibility, and ratios above +15 dB provide no additional benefit.

Step 5: Average and Weight

The apparent SNR values are averaged across the 14 modulation frequencies within each octave band, producing 7 band-level values. These are then combined using octave-band weighting factors that reflect the relative importance of each frequency band to speech intelligibility. The male speech weighting emphasises lower bands slightly; female speech weighting emphasises higher bands.

Step 6: Final STI Value

The weighted average is normalised to produce the final STI value between 0.00 and 1.00.

The key insight from this calculation is that STI integrates two independent degradation mechanisms — reverberation and noise — into a single number. Neither RT60 alone nor background noise level alone can tell you the STI. You need both.

The Three Enemies of Speech Intelligibility

Understanding STI becomes intuitive once you think of it as a battle between the speech signal and three adversaries. Each one attacks the signal in a different way, and their effects compound.

Enemy 1: Reverberation

Reverberation is the persistence of sound after the source stops, caused by reflections from room surfaces. In the context of STI, reverberation smears the temporal detail of speech. Each syllable is "stretched" in time by the reflected energy, causing it to overlap with the next syllable. The effect is similar to motion blur in photography — the faster the sequence of events (or syllables), the worse the smearing.

Rooms with RT60 above 0.8 seconds will almost always have STI below 0.60 unless background noise is exceptionally low and the listener is very close to the speaker. Rooms with RT60 above 1.2 seconds rarely achieve STI above 0.50 regardless of other conditions.

Enemy 2: Background Noise

Background noise — from HVAC systems, traffic, adjacent spaces, or equipment — adds a constant masking signal that reduces the depth of speech modulations as perceived by the listener. Even in a room with zero reverberation, an STI of 1.0 is impossible if the background noise approaches the speech level.

The critical metric is the signal-to-noise ratio (SNR) at the listener's position. Normal conversational speech at 1 metre is approximately 60 dBA. If background noise is 45 dBA, the SNR is +15 dB — adequate for good intelligibility. If background noise is 55 dBA, the SNR drops to +5 dB, and intelligibility degrades sharply even in an acoustically treated room.

HVAC noise is the most common culprit in commercial buildings. A poorly specified or unbalanced air handling system can easily produce 40–50 dBA at desk level, particularly in the 250 Hz and 500 Hz octave bands where it overlaps with speech energy.

Enemy 3: Distance

As the listener moves farther from the speaker, two things happen. The direct sound level decreases (inverse square law — 6 dB per doubling of distance in free field, somewhat less in rooms due to reflected energy). And the ratio of direct sound to reverberant sound decreases, meaning the listener hears proportionally more "smeared" reflections and less crisp direct sound.

In a typical meeting room, STI might be 0.70 at 2 metres from the speaker and 0.55 at 6 metres from the speaker — the same room, the same acoustics, but a dramatically different experience depending on where you sit.

This is why distance is the often-forgotten variable in STI assessments. A consultant who calculates STI at a single receiver position near the front of a classroom is not capturing the experience of students in the back row.

Worked Example: The Meeting Room That Almost Failed

Consider a meeting room with these dimensions and conditions:

  • Room dimensions: 6.0 m long, 5.0 m wide, 2.7 m high (volume = 81 m3)
  • RT60: 0.75 seconds (measured broadband average)
  • Background noise: 40 dBA (from ceiling-mounted HVAC diffuser)
  • Source-receiver distance: 4 metres (across the table)
Using the simplified STI estimation method (which approximates the full IEC 60268-16 calculation for rooms dominated by reverberation and diffuse-field noise), this room produces an STI of approximately 0.52.

That is in the "Fair" range. People can communicate, but they will lean forward, ask for repetitions, and leave meetings more tired than they should be. If the room is used for video conferencing, remote participants will fare even worse because the microphone picks up the degraded signal.

Now watch what happens when we make targeted improvements.

Intervention 1: Reduce RT60 to 0.45 seconds

Adding acoustic ceiling tiles with an NRC of 0.85 (replacing the existing plasterboard ceiling) and installing two absorptive wall panels behind the presenter drops the RT60 from 0.75 to 0.45 seconds. The STI rises to approximately 0.67.

That single change — costing perhaps $2,000 to $4,000 in materials and installation — moves the room from "Fair" to "Good." The percentage of sentences understood jumps from roughly 75% to 90%. Meeting productivity improves measurably.

Intervention 2: Reduce Background Noise to 30 dBA

Rebalancing the HVAC system to reduce supply air velocity, adding duct lining to the branch serving this room, and replacing the diffuser with a low-turbulence model drops background noise from 40 dBA to 30 dBA. Combined with the RT60 reduction, the STI rises to approximately 0.72.

The room is now near the top of the "Good" range. Communication is effortless. Video conference participants can hear clearly. The room has been transformed from a source of frustration into a space that people actively choose for important conversations.

The Lesson

The meeting room's original RT60 of 0.75 seconds would pass many building code requirements. The background noise of 40 dBA meets NR 35, which is the typical criterion for meeting rooms in BS 8233. On paper, this room "complies." But an STI of 0.52 means that one in four sentences is not fully understood. The room is code-compliant but functionally inadequate.

This is why STI matters. It measures the outcome — can people understand speech? — rather than the inputs (RT60, noise level) in isolation.

STI vs RT60: Why They Are Not Interchangeable

The most common misconception in architectural acoustics is that controlling reverberation time automatically ensures good speech intelligibility. It does not. RT60 and STI are related but distinct parameters, and a room can satisfy one while failing the other.

Scenario 1: Good RT60, Poor STI

A classroom with RT60 of 0.5 seconds (meeting the BB93 target of 0.6 seconds for furnished, unoccupied classrooms) but background noise of 50 dBA from road traffic. The RT60 is excellent. The STI is approximately 0.42 — in the "Poor" range. Students in this classroom will struggle to understand the teacher, despite the room meeting its reverberation time target.

The cause is obvious once you consider STI: the high background noise masks the speech modulations regardless of how well-controlled the reverberation is. An RT60 check alone would give this room a clean bill of health.

Scenario 2: Long RT60, Acceptable STI

A small tutorial room with RT60 of 0.9 seconds (above the typical target) but background noise of only 22 dBA (a very quiet building in a rural area) and a source-receiver distance of 1.5 metres (tutor sitting next to the student). The RT60 is technically non-compliant. The STI is approximately 0.62 — solidly in the "Good" range.

This scenario is unusual but illustrative. The very low background noise and short distance compensate for the excessive reverberation. An RT60-only assessment would flag this room as a problem. An STI assessment would show it is functionally adequate for one-to-one instruction.

The Takeaway

RT60 is an input to the STI calculation, not a substitute for it. Background noise is the other input. Distance is a modifier. You need all three to predict whether people can understand speech in a room. Specifying RT60 alone is like predicting a car's fuel economy from engine size alone — it is a relevant factor, but ignoring weight, aerodynamics, and driving conditions will give you the wrong answer.

Standards That Require STI

Several major building and workplace standards now explicitly require or reference STI, reflecting the growing recognition that reverberation time alone is insufficient.

IEC 60268-16:2020

The defining standard for STI. It specifies the calculation method, the measurement method, and the quality scale. Any STI value reported in a building assessment should reference this standard.

ANSI S12.60-2010 (American National Standard)

The US standard for acoustical performance criteria in classrooms. It requires an STI of at least 0.60 in core learning spaces when measured at the most distant listener position. This standard also sets background noise limits (35 dBA for small classrooms, 40 dBA for large ones) and RT60 limits (0.6 seconds for small, 0.7 seconds for large), but the STI requirement is the performance target that integrates all acoustic factors.

WELL v2 Feature 74 (Sound)

The WELL Building Standard includes STI requirements in two contexts. Part 3 (Sound Masking) requires that STI between workstations in open-plan offices be below 0.50 to ensure speech privacy — here, a lower STI is the goal, because you want to prevent one person's conversation from being intelligible at a neighbour's desk. Part 7 (Sound Reinforcement) requires STI above 0.60 in spaces where speech reinforcement systems are used.

This dual requirement — high STI where speech needs to be understood, low STI where speech privacy is needed — highlights the versatility of the metric.

BB93:2015 (UK Building Bulletin 93)

The UK standard for acoustic design of schools. While BB93 primarily specifies RT60 limits, it references STI as the preferred metric for assessing speech intelligibility in classrooms and recommends an STI of 0.60 or above. The associated performance specification, Section 1.3, notes that reverberation time targets are set to achieve adequate STI and that direct STI measurement or prediction should be used where possible.

DIN 18041:2016 (German Standard)

The German standard for acoustic quality in small to medium-sized rooms includes STI as a verification metric for rooms in usage group A (rooms where speech communication is the primary function). The standard provides STI targets tied to room type and distance from speaker.

STIPA: The Practical Measurement Method

Measuring the full STI requires generating 98 individual MTF measurements (14 modulation frequencies across 7 octave bands). In the laboratory, this is straightforward. On a construction site with contractors working in the next room, it is impractical.

STIPA (Speech Transmission Index for Public Address systems) is a streamlined measurement method defined in IEC 60268-16, Annex B. It uses a specially designed test signal — a modulated noise signal that contains all 14 modulation frequencies simultaneously across all 7 octave bands — and a single measurement to extract an STI value.

The STIPA test signal is played through a loudspeaker (or the room's PA system), and a STIPA analyser at the listener position measures how much the room has degraded the modulations. The entire measurement takes approximately 15 seconds.

STIPA results correlate with full STI results to within plus or minus 0.03 in most room conditions. For practical purposes, they are interchangeable. Nearly all field measurements of speech intelligibility in buildings today use the STIPA method.

The equipment required is modest: a STIPA-compatible measurement microphone and analyser (many handheld sound level meters now include STIPA capability), and a loudspeaker with a reasonably flat frequency response. The cost of a single STIPA measurement kit starts at approximately $3,000, putting it within reach of any acoustic consultancy.

What Affects STI in Practice

Beyond the three primary enemies — reverberation, noise, and distance — several practical factors influence STI in real buildings.

Room Shape and Geometry

Parallel walls create flutter echoes — rapid, distinct reflections that degrade STI more severely than diffuse reverberation of the same duration. A room with an RT60 of 0.6 seconds but strong flutter echoes can have an STI 0.05 to 0.10 lower than a room with the same RT60 but non-parallel walls or diffusing surfaces.

Ceiling Height

Lower ceilings produce earlier first reflections from the ceiling, which can either help or hinder STI depending on whether the ceiling is absorptive or reflective. A low reflective ceiling in a small room produces strong early reflections that reinforce speech. A low reflective ceiling in a large room (such as an open-plan office) creates excessive reverberant energy that degrades STI.

Furnishings and Occupancy

People are excellent sound absorbers. A fully occupied classroom has a lower RT60 than an empty one, and consequently a higher STI. Standards typically specify acoustic targets for the unoccupied condition because that is the measurable, repeatable condition. The occupied STI will almost always be better.

Upholstered furniture, bookshelves, and curtains all add absorption that reduces RT60 and improves STI. An unfurnished room during measurement will always perform worse than the same room in use.

Speaker Orientation

STI is directional. A speaker facing the listener projects more high-frequency energy toward the listener than a speaker facing away. The difference can be 0.05 to 0.10 in STI. This is particularly relevant in classrooms where the teacher frequently turns to write on a board, directing speech toward a reflective wall surface rather than toward the students.

Predicting STI Without Measuring

Not every project can afford or justify field STI measurements. In many cases — particularly during the design phase, before the room exists — STI must be predicted from the room's acoustic parameters.

AcousPlan calculates STI from two inputs you likely already know or can estimate: reverberation time (RT60) and background noise level. The calculation follows the IEC 60268-16 methodology, applying the modulation transfer function across octave bands and accounting for both reverberant and noise degradation.

This prediction is not a substitute for field measurement. It is a design tool that tells you, before a single ceiling tile is installed, whether your room is likely to achieve the STI target your standard requires. If the prediction shows STI of 0.48 and your target is 0.60, you know — early, cheaply — that additional acoustic treatment or noise control is needed.

The alternative is to discover the problem after construction, when the remediation cost is four to ten times higher and the building owner is already frustrated.

Key Takeaways

STI is the metric that answers the question building owners actually care about: can people understand speech in this room? Reverberation time and background noise level are inputs to that question, not answers to it.

An STI of 0.60 is the minimum for rooms where speech communication matters. Below 0.60, listener effort increases, comprehension drops, and fatigue accumulates. Above 0.60, communication flows naturally.

You cannot achieve good STI by controlling reverberation alone. Background noise is an equal partner in the degradation of speech. A room with perfect RT60 and noisy HVAC will have poor STI.

STI varies with position. A single STI value for a room is an approximation. The listener farthest from the speaker always has the worst STI. Design for the worst seat, not the best one.

Prediction during design is far cheaper than remediation after construction. If you know the room dimensions, the target RT60, and the expected background noise level, you can estimate STI before the room is built.

Try It Yourself

AcousPlan calculates STI automatically from your room geometry, surface materials, and background noise inputs. Enter your room dimensions, select your materials, set a background noise level, and see the predicted STI alongside the RT60 and compliance assessment — all in seconds, directly in your browser.

Open the AcousPlan Calculator

Related Articles

Run This Analysis Yourself

AcousPlan calculates RT60, STI, and compliance using the same standards referenced in this article. Free tier available.

Start Designing Free