AcousPlan™ — What Is Clarity (C80) and Definition (D50)? — Room Acoustic Parameters Explained

Imagine you are watching a live orchestra perform Beethoven's Fifth Symphony. The opening motif — those four famous notes — needs to land with precision. If the hall is designed well, each note is distinct: you hear the attack, the full body of the tone, and then a brief decay before the next note arrives. The music is intelligible. Now imagine the same performance in a marble-walled sports arena. The notes blur together into a smear of reverberant energy. You can hear that the orchestra is playing, but the individual notes lose their edges. The music becomes impressionistic rather than crisp.

What you are experiencing in those two scenarios is the difference in clarity — and there are two ISO-standardised numbers that quantify it precisely: C80 (Clarity, optimised for music) and D50 (Definition, optimised for speech).

The Core Idea: Early Energy vs Late Energy

Before getting into the formal definitions, here is the intuition you need.

When a musician plays a note, the sound reaches your ears in two stages. First comes the direct sound — the straight-line path from the instrument to your ears. This arrives almost instantly. Within the next 80 milliseconds or so, the first wave of early reflections arrive — sounds that bounced off the floor, the side walls, the ceiling, and the balcony faces before reaching you. These early reflections reinforce the direct sound and are perceived by your brain as part of the same note.

After 80 milliseconds, the late reverberant energy begins to dominate — the complex field of multiply-reflected sound that makes a room feel alive. Late reverberation is musically useful (it gives notes warmth and fullness) but if there is too much of it relative to the early arrivals, it masks the detail of subsequent notes and syllables.

Clarity (C80) is essentially the ratio of early energy (the first 80 ms) to late energy (everything after 80 ms). Definition (D50) is the fraction of total energy that arrives within the first 50 ms — the window most relevant to speech syllable recognition.

Formal Definitions: ISO 3382-1:2009

Both parameters are defined in ISO 3382-1:2009, the international standard for measuring room acoustic parameters in performance spaces. Section 4 of that standard provides the mathematical definitions.

C80 (Clarity)

C80 is defined as the logarithmic ratio of the sound energy arriving within the first 80 milliseconds after the direct sound to the sound energy arriving after 80 milliseconds:

C80 = 10 × log₁₀ [ ∫₀⁸⁰ p²(t) dt  /  ∫₈₀∞ p²(t) dt ]   dB

where p(t) is the instantaneous sound pressure of the impulse response at time t.

The 80 ms boundary was chosen because psychoacoustic research showed that the human auditory system integrates energy for approximately 80 ms before registering a new acoustic event. Energy arriving within that window is perceived as reinforcing the direct sound. Energy arriving later is perceived as reverberation.

The result is expressed in decibels. Positive values mean more early energy than late energy. Negative values mean late energy dominates. The "correct" value depends on the room's purpose:

Room type	Target C80
Orchestral concert hall	-4 dB to +1 dB
Opera house	-2 dB to +2 dB
Chamber music	-1 dB to +3 dB
Multi-use hall	0 dB to +4 dB
Cinema	+3 dB to +6 dB

D50 (Definition)

D50 (sometimes written Deutlichkeit, from the German word for clarity) measures the fraction of total sound energy that arrives within the first 50 milliseconds after the direct sound:

D50 = ∫₀⁵⁰ p²(t) dt  /  ∫₀∞ p²(t) dt

D50 is expressed as a dimensionless ratio between 0 and 1 (or sometimes as a percentage). The 50 ms boundary is shorter than the 80 ms used for C80 because speech syllables are shorter than musical notes — roughly 100-150 ms per syllable — and intelligibility depends on catching the onset of each syllable before the next one begins.

A D50 value of 0.5 (50%) is widely cited as the threshold for acceptable speech intelligibility. Above 0.5, most listeners in a room can follow a speaker with reasonable effort. Below 0.5, intelligibility begins to degrade noticeably. For critical speech spaces — courtrooms, conference rooms, classrooms — a D50 above 0.6 is preferable. For broadcast studios, values above 0.7 are common.

The Relationship Between C80, D50, and RT60

C80 and D50 are not independent of reverberation time — they are deeply linked to it. As RT60 increases, more late energy accumulates relative to early energy, so C80 drops and D50 drops. A rough rule of thumb:

A room with RT60 = 0.4 s typically has C80 ≈ +6 dB and D50 ≈ 0.75 (excellent for speech, too dry for orchestral music)
A room with RT60 = 1.4 s typically has C80 ≈ 0 dB and D50 ≈ 0.50 (balanced, suitable for mixed-use)
A room with RT60 = 2.5 s typically has C80 ≈ -5 dB and D50 ≈ 0.25 (excellent for organ music, poor for speech)

However, RT60 alone does not fully predict C80 or D50. Room geometry, source and receiver positions, and the spatial distribution of absorption all influence how quickly early reflections arrive. Two rooms with identical RT60 values can have very different C80 values if one has a low ceiling that sends useful early reflections to listeners while the other has a tall vault where the ceiling reflection arrives after 150 ms.

A Worked Example: Office Conference Room

Consider a rectangular conference room: 8 m long × 5 m wide × 2.7 m ceiling. The room has a carpeted floor, painted concrete walls, and a suspended mineral-fibre ceiling tile.

Step 1: Estimate RT60 using Sabine

Total surface area S = 2(8×5) + 2(8×2.7) + 2(5×2.7) = 80 + 43.2 + 27 = 150.2 m²

Average absorption coefficient at 1000 Hz:

Floor (carpet, 40 m²): α = 0.35
Ceiling (mineral fibre tile, 40 m²): α = 0.70
Walls (painted concrete, 70.2 m²): α = 0.05

Mean absorption: ā = (40×0.35 + 40×0.70 + 70.2×0.05) / 150.2 = (14 + 28 + 3.51) / 150.2 = 0.301

Total absorption A = ā × S = 0.301 × 150.2 = 45.2 m² (sabins)

RT60 = 0.161 × V / A = 0.161 × (8×5×2.7) / 45.2 = 0.161 × 108 / 45.2 ≈ 0.38 seconds

Step 2: Estimate C80

For a room with RT60 of 0.38 s, using the relationship between room volume, RT60, and early-to-late ratio (an approximation based on Barron's revised theory for diffuse-field rooms):

The reverberant energy level relative to direct sound ∝ RT60 / V. With RT60 = 0.38 s and V = 108 m³, the room is absorption-dominated and early reflections will arrive predominantly from the low ceiling (2.7 m → reflection path ≈ 5.4 m → arrival delay ≈ 16 ms) and the end wall (≈ 8 m → 24 ms). Both fall well within the 80 ms window.

Estimated C80 ≈ +7 dB. This is good for speech but would feel overly dry for any musical performance.

Step 3: Estimate D50

For RT60 = 0.38 s, the impulse response decays rapidly. The ratio of energy in the first 50 ms to total energy is high — estimated at approximately 0.75–0.80.

Conclusion: This conference room, as specified, has excellent speech clarity (D50 ≈ 0.75) and acceptable C80 for presentation audio. If the client wants to use the room for piano recitals, the carpet and heavy mineral-fibre ceiling would need to be substantially reduced to bring C80 down to a more musical range.

Common Mistakes When Designing for Clarity

Over-treating a space: Adding too much absorption raises C80 and D50 so high that the room sounds anechoic — a deeply unpleasant acoustic environment. Most listeners prefer some reverberation, even in speech-dominant spaces. A well-designed office meeting room targets D50 ≈ 0.6–0.7, not 0.95.

Ignoring seat position variation: C80 varies significantly across a room. In a typical rectangular classroom, the front rows might have C80 = +4 dB while the back rows are at -2 dB due to greater distance from the source and less benefit from early reflections. ISO 3382-1 recommends measuring at multiple receiver positions and reporting the average.

Conflating clarity with intelligibility: C80 and D50 are physical measurements of the acoustic field. Speech Transmission Index (STI) is a better predictor of perceived intelligibility because it also accounts for background noise. A room can have a good D50 but poor STI if HVAC noise is high.

Frequency dependence: C80 and D50 are frequency-dependent. ISO 3382-1 specifies measurement at octave bands from 125 Hz to 4000 Hz. A room may have excellent C80 at 1000 Hz but poor C80 at 250 Hz because low frequencies are not absorbed by the same materials that work at mid frequencies. Always check the full octave-band profile.

How AcousPlan Helps

AcousPlan's acoustic simulation engine calculates C80 and D50 alongside RT60, EDT, and STI for every room configuration you build. You can see immediately how a material change — replacing painted concrete walls with fabric panels, for example — shifts the early-to-late energy balance at each octave band.

The auto-solve function accepts a target C80 or D50 range alongside RT60 targets and searches the materials library for treatment combinations that satisfy all constraints simultaneously. For multi-use spaces that need to serve both speech presentations and small musical performances, this is particularly useful: the engine finds the narrow window of absorption coefficient and placement that keeps C80 between -2 dB and +3 dB while holding D50 above 0.50.

Open AcousPlan's room simulator and enter your room dimensions to see C80 and D50 alongside the full suite of ISO 3382-1 parameters.

What Is Clarity (C80) and Definition (D50)? — Room Acoustic Parameters Explained

The Core Idea: Early Energy vs Late Energy

Formal Definitions: ISO 3382-1:2009

C80 (Clarity)

D50 (Definition)

The Relationship Between C80, D50, and RT60

A Worked Example: Office Conference Room

Common Mistakes When Designing for Clarity

How AcousPlan Helps

Get Acoustic Design Updates

Related Articles

What Is RT60 — And Why It Determines Whether Your Room Sounds Good or Terrible

What Is STI (Speech Transmission Index) — Can People Actually Understand Speech in Your Room?

Room Acoustics Fundamentals: How Sound Behaves Inside a Room

Run This Analysis Yourself