Digital Audio Basics #2: Best Practices

Dec 08, 2021

Which sample rate should I use? What are inter-sample peaks? Dithering. Headroom. Bit resolution. Learn all these critical digital audio concepts to enhance your recordings and optimize them for playback & streaming.

By Craig Anderton

In Digital Audio Basics #1: What You Need to Know, we covered the basic elements of digital audio. Now, let’s take a deeper dive, and find out how to use digital audio to its maximum potential.

Hardware Resolution vs. Recording-Software Resolution

Recording resolution, as discussed in Part 1, describes the accuracy with which your system can capture audio and convert that data into numbers. However, once those numbers are in your computer, they’ll be manipulated—which brings us to a different kind of resolution.

Here’s why it’s needed, and how to optimize it in your software.

Changing levels in your software involves multiplying and dividing the numbers that represent digital audio. So, it’s easy to end up with totals that require higher resolution. Consider this simple example: If you multiply 2 x 2, you need only 1 digit (4) to represent the result. But if you multiply 2 x 9—which are both single-digit numbers—you now need two digits to give the result of 18.

So, performing mathematical operations on a 24-bit number can create results that require more than 24 bits of resolution. If you round off the result to 24 bits, after multiple mathematical operations the rounding off could lead to errors that might be audible.

As a result, the resolution that’s used by the audio engine inside your computer to process audio will have a higher resolution than, and be independent of, the recording resolution. This audio engine resolution, also called processing resolution, will be set in your program’s preferences.

For best results, choose the highest available resolution (typically 64 bits), but note this may stress out your computer more than lower resolutions. If your system runs more smoothly with 32-bit resolution, use that instead (fig. 1).

Figure 1: Most recording programs offer a choice of audio engine resolution. 64-bit resolution is being chosen here in Studio One.

It’s a myth that using the higher recording resolution requires storing 32- or 64-bit audio files. Although you could, it’s not necessary because the calculations happen in real time to the 16- or 24-bit files. Eventually, you’ll do a mixdown, and the audio that’s being processed with 32 or 64 bits of resolution, in real time, will end up with 16 or 24 bits of resolution in your delivery medium. Although some programs let you choose 24-bit audio engine resolution, there’s no advantage to doing so.

Accuracy vs. Resolution

Accuracy is not the same as resolution. If your audio interface has 24-bit resolution, then it theoretically has about 144 dB of dynamic range (approximately 6 dB per bit). Also, in theory, the 16,777,216 values are all equally spaced. But in the real world that’s not true, because 24 bits reaches the technical limits of analog-to-digital and digital-to-analog converters. No 24-bit converter truly delivers 24 bits of resolution. Noise can reduce the dynamic range, circuit board layout can result in interference for low-level signals, and manufacturing tolerances for analog-to-digital and digital-to-analog converters affect accuracy.

These errors are small (some would say insignificant), and accuracy has improved dramatically over the years, but be aware that digital circuity isn’t perfect yet. As a result, a 24-bit converter will more likely deliver a real-world resolution between 20 and 22 bits. Older 16- and 20-bit interfaces typically delivered 14 and 18 “real” bits of resolution respectively, so it’s worth investing in a modern interface with 24-bit conversion.

When Higher Sample Rates Can Make a Difference

Professional studios tend to use higher sample rates, like 96 kHz (some even record at 192 kHz). But why? The vast majority of people can’t hear any difference between audio recorded at 44.1 or 96 kHz.

However, there are some circumstances where higher sample rates can make an audible improvement. These mostly involve sounds generated inside a computer, like from virtual instruments or guitar amp simulators. This is because higher frequencies from these types of sounds may generate harmonics that conflict with the frequency of the clock that sets the sample rate. This conflict causes a subtle type of distortion.

(Side note: modern plugins are not as prone to this issue, because they often implement a process called oversampling. This makes a plugin perform as if it were running at a higher sample rate.)

The following audio examples of a virtual instrument with lots of high-frequency harmonics were recorded at 44.1 kHz, 96 kHz and 192 kHz. They were then converted back to 44.1 kHz. You can hear, even when data-compressed into MP3s, that the high frequencies in the 96 and 192 kHz versions have more clarity than the 44.1 kHz version.

It may seem counter-intuitive that the benefits of recording at higher sample rates remain even when converted to a lower sample rate. This is because once the instrument sound has been captured as audio, rather than a virtual sound generated within a computer, then any possibility of distortion caused by a lower sample rate is no longer relevant. Audio is audio, and 44.1 kHz is perfectly capable of reproducing audio.

Most programs will let you change a 44.1 or 48 kHz session’s sample rate temporarily to 96 or 192 kHz. Then you can export or bounce the virtual instrument or amp sim sound, as captured by the higher sample rate, and import it back into your lower-sample-rate project to gain the benefits of recording at a higher sample rate.

Headroom in Digital Systems

Although the audio engine inside your program has an almost unlimited dynamic range, audio going into, or coming out of, your computer goes through hardware. Even modern audio hardware doesn’t have an infinite dynamic range. Exceeding the dynamic range produces distortion, so it’s good practice to allow for some headroom—the difference in level between a signal’s peak, and the maximum level an analog-to-digital converter, or digital-to-analog converter, can handle.

For example, if a signal’s peaks reach 0 on your software’s virtual meters when recording, then it has used up the interface’s available headroom prior to entering your computer. Any level increases at the interface will result in distortion (unfortunately, digital distortion sounds harsher than the type of distortion associated with tube amps and most analog circuitry). On the other hand, if the peaks go no higher than -6 dB, then there’s 6 dB of headroom prior to the onset of distortion. When recording, many engineers recommend setting digital audio levels at least 6 dB below 0 (or even lower—peak levels of -12 dB or -15 dB are common).

During playback, keep a digital mixer’s master fader close to 0, and adjust levels within individual channels to prevent overloads when the master output feeds your audio interface for playback. This is a better way to manage levels than keeping the channel faders high, and then reducing the master gain to bring the output level down to 0 dB (or preferably, somewhat lower).

Intersample Distortion and True Peak

This is a more esoteric concept, but it’s baked into the best practices for streaming services like Spotify.

When mixing, another reason to leave a few dB of headroom at the master output is that most digital metering measures the level of the digital audio samples. However, converting digital audio back to analog may result in higher values than the samples themselves, which can cause intersample distortion. This type of distortion can occur when the digital-to-analog converter’s smoothing filter reconstructs the original waveform. This reconstructed waveform might have a higher amplitude than the peak level of the samples, which means the waveform now exceeds the maximum available headroom (fig. 2). This peak level is called true peak. As with regular peak levels, you don’t want it to go over 0.

Figure 2: With the analog audio waveform sampled in 2A, raising the digital audio’s level to 0 dB as shown in 2B (B) may seem safe, but can still cause problems. In 2C, the smoothing filter causes the audio to exceed the maximum available headroom.

Most streaming services will ask for a file with -1 or -2 dB for true peak readings. This minimizes the potential for distortion if the streaming service applies data compression to your file, like turning it into an MP3 format file.

About Dithering

We’ve described what happens when levels exceed the available resolution, but issues can occur when audio runs out of low-level resolution. Dithering, which is becoming more of a “legacy” process, dates back to when the best resolution you could expect from digital audio was 16 bits. As a result, very low-level audio could be compromised. When audio engines started adopting 24 bit-resolution, CDs still used 16 bits. To accommodate the CD, 24-bit files simply discarded 8 bits of resolution when mastered for CDs. This could produce an annoying, buzzing noise, but only at extremely low levels (like during the fadeouts of delicate acoustic music).

To compensate, dithering adds a controlled amount of noise to the lowest-level signals that, without going into technical details, trades off the nasty buzzy sound for a virtually inaudible amount of background hiss. However, note that these very low-level signals would usually be masked in normal listening environments anyway.

You need to apply dithering only when trying to shoehorn higher-resolution audio into a lower-resolution file format. Practically speaking, this matters if your mixed file is 24 bits, but needs to be mastered to 16-bit resolution for CD. Otherwise, dither is no longer a crucial issue, especially because most pop has a limited dynamic range anyway.

If there’s a choice of dithering noise, choose one with shaped noise. This creates the benefits of standard dithering, but with lower perceived noise…assuming you can even hear something at that low a level.

Learn more about Dithering here!

Hey, Remember CDs?

Just to show how far digital audio has come, when CDs first appeared, even though the CD itself used 16-bit resolution, the players themselves often used 12-bit converters to reduce costs. For the reasons given above, they only delivered 10 bits or so of “real” conversion. So, when people first heard CDs and didn’t like the sound, it’s not surprising. No self-respecting recording studio would consider 10-bit resolution sufficient.

But we don’t have to worry about that anymore—with 24-bit resolution, ever-higher resolution audio engines, and 96 kHz sample rates for those who want a step above 44.1/48 kHz, we’re covered for high-quality audio. In fact, one of the ironies of today’s digital audio technology is that it has become so sophisticated it can emulate analog sound quality with exceptional accuracy—but without the drawbacks.

Want more on digital audio? Check out part #1 of this two-part series here.

Musician/author Craig Anderton is an internationally recognized authority on music and technology. He has played on, produced, or mastered over 20 major label recordings and hundreds of tracks, authored 45 books, toured extensively during the 60s, played Carnegie Hall, worked as a studio musician in the 70s, written over a thousand articles, lectured on technology and the arts (in 10 countries, 38 U.S. states, and three languages), and done sound design and consulting work for numerous music industry companies. He is the current President of the MIDI Association www.craiganderton.org.

Want to get more tips straight to your inbox? Subscribe to our newsletter here.

Craig Anderton

Loading....

More Mixing & Production Tips