Need to mix audio for 360 videos or VR projects and not sure where to start? Get a jumpstart with this basic guide to Ambisonics B-format, the most widely used audio format for 360 applications.
In recent years, VR game engines, 3D installations, and 360° videos on social media such as YouTube and Facebook have all become dependent on full-immersion, three-dimensional, 360-degree audio. If you’re a sound engineer, you are more likely than ever to be working on projects that involve converting, mixing, panning and playing back 360 audio.
The most popular audio standard for handling and delivering such audio is called Ambisonics, and professional Ambisonics audio tools adapted to the familiar workflow of audio engineers are now available. The information on this page will help you understand the basic concepts of Ambisonics and the basic workflow of handling 360-degree audio projects in Ambisonics format.
What is Ambisonics?
Ambisonics is a method for recording, mixing and playing back three-dimensional 360-degree audio. It was invented in the 1970s but was never commercially adopted until recently with the development of the VR industry which requires 360° audio solutions.
The basic approach of Ambisonics is to treat an audio scene as a full 360-degree sphere of sound coming from different directions around a center point. The center point is where the microphone is placed while recording, or where the listener’s ‘sweet spot’ is located while playing back.
The most popular Ambisonics format today, widely used in VR and 360 video, is a 4-channel format called Ambisonics B-format, which uses as few as four channels (more on which below) to reproduce a complete sphere of sound.
Why Ambisonics, why now? Ambisonics vs. Surround
Ambisonics 360 audio is sometimes confused with traditional surround sound technologies. But they are not the same, and there are major differences between them. And there are reasons why Ambisonics, rather than classic surround formats, has been adopted as the technology of choice for emerging VR and 360 applications.
Traditional surround technologies are more immersive than simple two-channel stereo, but the principle behind them is the same: they all create an audio image by sending audio to a specific, pre-determined array of speakers. Stereo sends audio to two speakers; 5.1 surround to six; 7.1 to eight; and so on.
By contrast, Ambisonics does not send audio signal to any particular number of speakers; it is “speaker-agnostic.” Instead, Ambisonics can be decoded to any speaker array (more on which below). Ambisonic audio represents a full, uninterrupted sphere of sound, without being restricted by the limitations of any specific playback system.
All this helps explain why Ambisonics has become standard in 360 video and VR:
- Traditional surround formats can provide good imaging when static; but as the sound field rotates, the sound tends to ‘jump’ from one speaker to another. By contrast, Ambisonics can create a smooth, stable and continuous sphere of sound, even when the audio scene rotates (as, for example, when a gamer wearing a VR headset moves her head around). This is because Ambisonics is not pre-limited to any particular speaker array,
- Traditional surround speaker systems are usually ‘front-biased’: information from the side or rear speakers is not as focused as the sound from the front. By contrast, Ambisonics is designed to spread the sound evenly throughout the three-dimensional sphere.
- Finally, whereas traditional surround systems have various difficulties representing sound beyond the horizontal dimension, Ambisonics is designed to deliver a full sphere complete with elevation, where sounds are easily represented as coming from above and below as well as in front or behind the user.
4 channels = Full 360 degrees
At this point you may be wondering what all this means for a working audio engineer: how exactly is Ambisonics recorded, mixed and played back, if it does not eventually route to individual speakers? But before we jump ahead to the practical aspects, let’s spend a little more time on a more basic theoretical question:
How does Ambisonics represent an entire 360-degree soundfield with as few as four channels?
Let's take a look at the most basic (and today the most widely used) Ambisonics format, the 4-channel B-format, also known as first-order Ambisonics B-format.
The four channels in first-order B-format are called W, X, Y and Z. One simplified and not entirely accurate way to describe these four channels is to say that each represents a different directionality in the 360-degree sphere: center, left-right, front-back, and up-down.
A more accurate explanation is that each of these four channels represents, in mathematical language, a different spherical harmonic component – or, in language more familiar to audio engineers, a different microphone polar pattern pointing in a specific direction, with the four being coincident (that is, conjoined at the center point of the sphere):
- W is an omni-directional polar pattern, containing all sounds in the sphere, coming from all directions at equal gain and phase.
- X is a figure-8 bi-directional polar pattern pointing forward.
- Y is a figure-8 bi-directional polar pattern pointing to the left.
- Z is a figure-8 bi-directional polar pattern pointing up.
An even fuller explanation will sound familiar to anyone acquainted with omni and figure-8 bi-direction microphones. Take the X channel described above. Like any figure-8 microphone, it has a positive side and a negative (inverse phase) side. While the X channel’s figure-8 polar pattern points forwards, its negative side points backwards. The resulting audio signal on the X channel contains all the sound that is in the front of the sphere with positive phase, and all the sounds from the back of the sphere with negative phase. Also, as with figure-8 microphones, the gain picked up for each direction is different: the signal directly in front or behind will be picked up with full gain, but as you move away from this bi-directional axis the gain drops, until at exactly 90 degrees to the figure-8 you will get zero gain.
The same goes for the Y and Z channels: The Y channels pick up the left side of the sphere with positive phase and the right side with negative phase. The Z channel picks up the top side of the sphere with positive phase and the bottom with negative phase. This way, by means of differential gain and phase relations, the four channels combined represent the entire three-dimensional, 360-degree sphere of sound.
All this, by the way, should sound very familiar to any audio engineer with an understanding of mid-side (M/S) stereo processing. Recording M/S requires two coincident microphones:
- M: an omni mic for the mid (analogous to the W channel in B-format).
- S: a figure-8 mic for the sides (analogous to the Y channel in B-format).
Together, the M and S channels capture the entire L/R stereo field via differences in gain and phase on the side channel (where L=M+S, R=M-S). The exact same principle holds for the WXYZ channels of B-format, only with two extra channels providing depth (X) and elevation (Z).
AmbiX vs. FuMa
It is also worth mentioning at this point that there are two conventions within the Ambisonics B-format standard: AmbiX and FuMa. They are quite similar, but not interchangeable: they differ by the sequence in which the four channels are arranged, with AmbiX, for example, arranged WYZX instead of WXYZ. The Waves Ambisonics plugins use the AmbiX format – which is why on the plugin interfaces you’ll see the channels ordered WYZX. To enable you to move back and forth between AmbiX and FuMa, the Waves B360 Ambisonics Encoder plugin includes AmbiX-to-FuMa and FuMa-to-AmbiX convertors.)
First-order to sixth-order Ambisonics
Before we get back to the more practical aspects of working with Ambisonics, it is also worth noting that the 4-channel format described above is only a simple, first-order form of B-format, which is what most Ambisonics microphones and playback platform support today. While even first-order B-format provides higher-resolution spatial immersion than traditional surround technologies, higher-order B-format audio can provide even higher spatial resolutions, with more channels providing more different polar patterns. Thus, second-order Ambisonics uses 9 channels, third-order Ambisonics jumps up to 16 channels, all the way up to sixth-order Ambisonics with 49 channels.
Recording, encoding and playing back Ambisonics B-format
Let’s get back now to the more practical aspects of working with first-order B-format. An audio engineer delivering Ambisonics audio for a VR or 360 project could be dealing with one of two basic scenarios:
- The entire session may already be in Ambisonics B-format (for example, it may have been originally recorded using an Ambisonics microphone); or
- The session may be in traditional surround, in which case you would need to convert it to Ambisonics; or perhaps it includes several separate mono or stereo elements from which you want to create a new Ambisonics mix, in which case, again, you would need to convert the tracks and also position them in the final 360° mix.
An Ambisonics recording microphone is built of four microphone capsules encased closely together. These capsules are cardioid polar patterns, and the signals they record are usually referred to as “Ambisonics A-format.” The A-format is then transformed to B-format by a simple matrix to the WXYZ channels.
Encoding (converting) mono, stereo or surround into B-format
B-format audio can also be encoded, or synthesized, out of regular audio recordings by an Ambisonics encoder.
When encoding a mono track into B-format, you will need to decide where (in which direction) to position the mono signal in the 360-degree soundfield (the Waves B360 Ambisonics Encoder has panner-like controls which enable you to do that). The output of the encoding process will be a 4-channel B-format track, and the mono track will be present in each of these channels with the specific gain and phase that corresponds to its direction in the soundfield.
Encoding multi-channel (stereo or surround) audio into Ambisonics follows the same principle. Each channel is encoded individually, like a mono track, in a set direction, and the results are summed together.
Encoding regular audio into B-format is useful if you want to add sources to an existing B-format recording; mix a complete B-format track out of regular audio tracks by encoding each one separately and then combining them; or simply convert an entire multichannel mix into B-format.
The Waves B360 plugin can address all the above use cases. It has mono, stereo, 5.1 and 7.1 components that encode the input onto B-format, with controls which allow you to position (pan) each element in the soundfield.
In principle, you can play back Ambisonics on almost any speaker array, recreating the spherical soundfield at the listening spot. But to do that, you need to decode the four B-format channels for the specific speaker array.
Once again, decoding Ambisonics to speaker feeds is analogous to decoding M/S signals for stereo, only more complex. All four B-format channels are summed to each speaker feed. Each of the four channels is summed with different gain and phase, depending on the direction of the speaker. Some of the sources in the mix are summed in-phase while others are summed out-of-phase at each specific speaker. The result is that sources aligned with the direction of the speaker are louder, while those not aligned in the direction of the speaker are lower or cancel out. (Of course, if the speaker array is not fully spherical – for example, if it is just a regular stereo setup – the entire mix will be folded down when decoded to the available speakers.)
Ambisonics on headphones?
Recently, Ambisonics has been adopted by the VR industry to deliver 360 audio for 360 videos, gaming and virtual reality experiences. Usually, the audio is experienced by the end user via headphones and a head-mounted display. This means that audio engineers who wish to hear the end result the way the user will hear it should monitor their Ambisonics mix on headphones.
In addition, a multi-speaker spherical array for Ambisonics playback is highly expensive and often impractical even for professional studios.
For both these reasons, it is advisable for audio engineers to be able to monitor their Ambisonics sessions on headphones.
How does this work?
Spatial sound on headphone is made possible by binaural audio technologies. In essence, a binaural processor receives an audio input and a direction in which to position it. the processor adds auditory cues to the signal, so that when played back on headphones it is experienced at the set virtual position.
The most common way to process Ambisonics for binaural spatial playback on headphones is to decode the Ambisonics channels for a certain speaker array – but then, instead of being sent to actual speakers, the feeds are sent to a binaural processor which virtually positions them at the direction that the actual speaker would have been. The result is that the immersive spherical soundfield is experienced by the listener when monitoring on headphones.
The Waves Nx Virtual Mix Room plugin has a component called Nx Ambisonics which does just that. You feed Ambisonics channels into the plugin, and hear the soundfield reproduced on headphones, complete with head tracking. (For convenience, all Waves Ambisonics tools – the B360 and Nx plugins, plus the Nx Head Tracker – are available together as part of the Waves 360° Ambisonics Tools package.)
We’ve just scratched the surface of Ambisonics theory and practice, with the goal of offering audio engineers some understanding of the basic concepts and workflow of Ambisonics B-format. Future information will cover the finer points of mixing Ambisonics audio, so stay tuned!