Limited Time: Buy 2 Get 2 Free + $29.99 Plugins | Shop Now »

3D Audio on Headphones: How Does It Work?

Feb 04, 2021

Listening to sound in the real world and listening on headphones are two different experiences. This makes it harder for headphone mixes to translate to speakers. Learn why, and how you can bridge the gap with 3D audio software.

3D Audio on Headphones: How Does It Work?

The perception of spatial sound in the real world—for example, when you listen to audio played through speakers—is a complex phenomenon. It combines the interactions between the acoustic sound waves and the room or space, the interaction with our head and ears, the reaction of our middle and inner ear and the audio nerve, and finally our brain’s cognition and interpretation of the acoustic scene.

The perception of sound over headphones is a completely different experience. As a result, mixes done on headphones might translate poorly to speakers in a room.

Here are some of the main differences—and how you can overcome them with 3D audio plugins for monitoring on headphones, such as CLA Nx, Abbey Road Studio 3 and Nx Ocean Way Nashville.

1. Channel Crosstalk: The Stereo Magic

When we listen to a left/right speaker configuration, the signal from the left speaker arrives at both our left and right ears and is summed with the input from the right speaker. When we listen to the same content on headphones, the left ear receives only the left channel and the right ear receives on the right channel.

Channel crosstalk

Channel crosstalk

2. HEADS UP! Head and Ear Filtering and Delays

After propagating through the air and before arriving at the eardrum, the soundwave undergoes a filtering and delay effect due to the size and shape of our head and ears. The wave front arrives at the ears at different times and with different frequency shapes. The delays and filters will depend on the angle from which the sound originates. When listening to headphones, this filtering and delaying effect is essentially bypassed and the direct signal is inserted almost directly to our eardrums (depending on the headphone type).

Ear filtering: Direction-dependent frequency change

Ear filtering: Direction-dependent frequency change

3. Early Reflections: We Are Not Alone

In the real world, and even in the driest studio, the direct sound from the speakers is not the only thing that arrives at the ears. The soundwave interacts with the room by bouncing off the walls and other physical objects creating multiple highly correlated signals coming from numerous directions. These are referred to as early reflections, they also undergo filtering and delaying based on the direction from which they arrived. Our brain uses the gains, times of arrival, and directions of these early reflections relative to the direct source to estimate the distance of the source and the dimensions and acoustic characteristics of the listening space. Again, on headphones, none of this happens; only the dry signal is introduced to the ear, and there is no indication of how it will interact with a physical environment.

Early reflections

Early reflections

4. Head Motion: “And Yet It Moves!”

Since all of the above-described phenomena depend on the direction of sound, even the slightest nudge to our head causes the complete audio scene to shift in the opposite direction because the external world is not moving with the head. Now, this cue is as crucial as any of the others—perhaps even more so. Our brain, being highly sensitive to change, remembers where the sound used to be and where it is now, combines this with its knowledge that it itself (and not the source) has moved, and uses this information to locate the static external source. When we listen on headphones, the audio scene constantly moves with the head, contradicting any previous supposition the brain has retained regarding the location of sound sources.

Head movement

Head movement

Why These Differences Matter to Mixing

All of the above are imperative cues that the brain uses while continually making decisions about the location of sound sources. Now, our brain is not a rash decision maker, and it is not easy to fool. It has developed the ability to localize sound through millions of years of evolution. To know by listening where a predator might be lurking or where prey can be caught is obviously crucial for survival. When sound cues are missing or contradicting, the brain becomes confused until it ultimately gives up the attempt to locate sound, and the scene collapses into our head.

This is the experience of headphone listening. Since the cues that help us locate sound sources in space are missing, we hear the sounds as if they are nested within the head. All the elements we are hearing are crowded along the two-dimensional line stretching through the head from ear to ear instead of the three-dimensional space outside of our head.

Three-dimensional stereo image on speakers

Three-dimensional stereo image on speakers

Flat stereo image on ordinary headphones

Flat stereo image on ordinary headphones

The flatness of the ordinary headphone experience can have several negative consequences:

♦ Wrong or missing spatial image: When listening on headphones, we either fail to perceive or misunderstand the spatial intentions of the mix that the producer or recording artist wanted to convey.

For example, in the Beatles’ song “A Day in the Life,” the vocals start on the right channel and the piano on the left. Then, in the course of the song’s first verse, they gradually move towards each other, until by the second verse they’ve completely traded places. This is an essential part of the listening experience, and we can hear the transition properly occurring in the auditory space when we experience it through speakers. Listening to the song on headphones with flat headphone sound will not reproduce this auditory scene properly, and the experience will be greatly reduced.

♦ Mixing decisions: When you mix audio on headphones, the missing three-dimensional spatial image makes it harder to judge mix depth. The missing room reflections may make it harder to make judgments about reverb. The missing sustain of a natural acoustic environments makes it very difficult to judge how different frequencies (especially in the low end) will resonate once played over speakers. Highly experienced engineers will know how to compensate adequately for these differences and predict how a headphone mix will sound over speakers–and even then proper translation may be hard to achieve. Those who are less experienced might find that their good-sounding headphone mixes translate very poorly to speakers.

♦ Listening fatigue: The inner-head experience created on headphones can cause listening fatigue since the brain is not used to this type of sensation. The brain is continuously trying to comprehend the spatial audio scene, but the cues are either contradicting or missing, leaving the brain in a constant state of confusion.

♦ Surround Sound: It is practically impossible to create surround sound on ordinary headphones, primarily because they cannot convey the surround image of sources located behind the listener.

Waves Nx technology has been developed in order to bridge the gap between listening to sound from external sources and listening on headphones. Available now in four plugins (Nx Virtual Mix Room, CLA Nx, Abbey Road Studio 3 and Nx Ocean Way Nashville), the Nx algorithm inserts all of the above-described missing cues into the signal in order to convince the brain that sound is coming from virtual speaker positions in space, with options for both stereo and surround.

3D stereo image on headphones with Nx

3D stereo image on headphones with Nx

3D surround image on headphones with Nx

3D surround image on headphones with Nx

Nx does all this in a subtle manner, adding only the critical and global cues required in order to recreate the spatial 3D audio image, without otherwise modifying or coloring the sound. The filters and ambience are optimized to create a transparent-sounding room in order to minimize the frequency alteration, such that all changes are perceived as relating to space rather than equalization. By adjusting the sound to the user’s head movements, the 3D perception is created without dramatic changes to the frequency response.

In the Nx Virtual Mix Room, this is achieved by recreating, over headphones, an 'idealized' mix room. In the CLA Nx, Abbey Road Studio 3 and Nx Ocean Way Nashville plugins, this is achieved by recreating the acoustic environment of a real studio control room, combining the Nx algorithm with impulse response measurements from the actual famed studios.

Whichever of the three plugins you choose, the result should help you craft mixes on headphones that will translate reliably once you hear them on speakers in the real world.

See frequently asked questions about the CLA Nx plugin.

Originally published June 23, 2016

Loading....