This Week: Memorial Day Sale | Shop Now »

The Science of Mix Referencing: The 5-Step Method

Apr 12, 2021

A reference track can serve as a welcome reality check when you’re mixing or mastering – but if you’re just giving it a quick A/B you are probably not getting the full picture. Time to learn how to do it right.

By Craig Anderton

The Science of Mix Referencing: The 5-Step Method

Comparing your music to similar recordings that were mixed and mastered by stellar engineers is a common practice and can help reveal where your music might need some course corrections. But if you’re just getting into mixing and mastering your own music, you may wonder why despite your best efforts to match your levels and equalization to your favorite reference recordings, the sound still doesn’t quite stack up.

Of course, experience, ears, and monitoring systems are one reason.
But let’s dig deeper.
Rather than think the answer is about comparing your work to a world-class engineer, consider that a more valuable answer can be reverse-engineering what makes their mixes world-class. Then, you can apply what you learn to your work.

Curve-matching software is a limited version of this process. It analyzes a source track, a target track, and then reverse-engineers an EQ curve that makes the target more like the source. Although this can be useful, the results don’t always translate across genres or different styles of music. Besides, as we’ll see, there’s much more to analyzing tracks than just EQ.

The five main steps in comparing music to a reference involve matching the following characteristics, usually in this order:

  • Perceived levels
  • Musical dynamics
  • Equalization
  • Panning
  • Imaging and Transient Response

1. Matching Perceived Levels

Without matching the reference’s overall level, any additional comparisons will be less valid. Your ears have different frequency responses at different levels, with bass and treble falling off the most at low levels. Unless the overall levels are matched, you can’t analyze EQ properly.

The R128 standard quantifies music’s perceived level, as measured in LUFS (Loudness Units Full Scale). The more negative the LUFS reading, the softer the perceived level. Turning down a hyper-compressed metal song to the same LUFS reading as an acoustic folk song makes them sound like they’re at the same volume. This is an advantage for broadcasters and streaming services because they don’t have to compress everything (or change levels constantly) to give a consistent listening experience for different recordings. When making comparisons, we also want to match the perceived levels.

When musicians start comparing their mixes to commercial releases, usually the first question is, “the peaks of my music hit 0; why is my mix so much softer?” The answer is that the commercial release was mastered, which likely involved dynamics processing in creating a hotter, louder sound. Limiters like the Waves L3 Multimaximizer can raise a source LUFS level to match a target’s LUFS level (Fig. 1).

Figure 1: The LUFS readings toward the left, taken with the WLM Plus Loudness Meter, show the before-and-after results of limiting with the L3 Multimaximizer (-15 and -9, respectively).

Figure 1: The LUFS readings toward the left, taken with the WLM Plus Loudness Meter, show the before-and-after results of limiting with the L3 Multimaximizer (-15 and -9, respectively).

For example, suppose your club mix has an LUFS value of -15, but it’s going to be played in a set with songs that were mastered really hot and have a -9 LUFS. In Fig. 1, applying 7 dB of limiting raised the song’s LUFS to -9. There’s a correlation between limiting and LUFS because each dB of limiting increases LUFS by approximately 1 dB. For example, if you need to raise a -15 LUFS song to -12 LUFS, start with 3 dB of limiting, then tweak further if needed.

2. Matching the Loudness Range (LRA)

The second number in Fig. 1, Range, corresponds to the music’s overall dynamics in the musical sense, not the technical sense (i.e., this range is not about the available headroom or the signal-to-noise ratio). Think of range, also called LRA, as measuring the prevalence of crescendos, decrescendos, and other musical dynamics. Classical music will have a wide range, while most rap music won’t.

Lower numbers indicate less dynamic range. Classical music will typically have an LRA around 9 or 10, rock about 5 or 6, EDM 4 or 5, and rap is usually around 3. But these vary greatly within genres. A rock ballad could have a wider loudness range than Bach, while a DJ’s EDM set might have an arrangement with so many variations that it ends up with an LRA of 7 or 8.

The loudness range measurement is not a value judgement where a wider range is “good.” For example, some Bob Marley songs have a very narrow loudness range, like 3. However, his tracks were often cut to a low perceived level, like -15 LUFS. This is perhaps why he could create such hypnotic grooves—he hit a sweet spot of fairly constant dynamics, with restrained levels.

Note that there’s no specific correlation between LUFS and LRA. In Fig. 1 above, although the LUFS got a lot hotter, the dynamic range remained the same. This is because limiting basically just shaves off the peaks, so reasonable amounts of limiting retain most of the dynamic range. But we can also do the reverse: reduce the range of dynamics, yet keep the same LUFS reading (Fig. 2).

Figure 2: Compression has transformed a song with -15 LUFS and 7 LRA into -15 LUFS and 3 LRA.

Figure 2: Compression has transformed a song with -15 LUFS and 7 LRA into -15 LUFS and 3 LRA.

A compressor like the Renaissance Compressor is ideal to narrow the dynamic range. How you narrow the range is subjective because a high threshold and high ratio, which compresses a lot at the top of the dynamic range, can give similar numerical (but not sonic) results to a low threshold and low ratio, which applies subtler compression over a wider dynamic range. The settings in Fig. 2 are between these two extremes.

With the “loudness wars” starting to dissipate due to the adoption of LUFS, music is making better use of dynamics because trying to have THE LOUDEST POSSIBLE LEVEL is no longer the primary goal. Nonetheless, a wide dynamic range is not always desirable. Some people assume that because Spotify adjusts tracks to an LUFS of -14, they can’t limit their music anymore because the end result will be louder than -14. But that misses the point.

Some music, like rock, can benefit stylistically from less dynamics—which is why many engineers apply a bus compressor (e.g., the SSL G-Master Buss Compressor) to their final mix. So, you can use a compressor and/or limiter, end up with a song that sounds “right” at -12 or -9 LUFS or whatever, and Spotify will simply turn it down to -14 LUFS. The music will have the same perceived level as other music, but the character will be different because of compression, limiting, or both.

SSL G-Master Buss Compressor

SSL G-Master Buss Compressor

The audio examples below demonstrate the results of altering LUFS and LRA. The first example has LUFS = -16 and LRA = 5. The second example is the same, but with LRA = 3 (more compressed, less dynamics). The third example is LUFS = -12 (from limiting) and LRA = 5, so it maintains the same dynamics as the first section but has a higher perceived level. The fourth example is the most “squashed,” with LUFS = 12 and LRA = 3.

The fifth example is the same as the third (my favorite), but the level is turned down to match the first example. Note how examples 1 and 5 have the same perceived level, yet 5 has a slightly different character because of the added compression.

The takeaway is that matching levels is only part of matching a reference. You need to take into account the dynamic range (LRA), perceived level (LUFS), and how the two interact.

3. Matching Equalization

The PAZ Analyzer displays relative levels at different frequencies. Matching EQ based solely on a graph is not a good idea because the graph will have variations specific to a particular piece of music. However, you can see the general emphasis in certain frequency ranges.

Listen carefully to your music and compare it—frequency range by frequency range—to the reference. Start with the bass and make sure it’s full but not overpowering or muddy. Then listen to the lower mids. Often, commercial recordings pull back a bit in the 250 – 500 Hz range. (However, this might not be reflected in a PAZ graph because that range might have already been cut somewhat.)

Check carefully in the midrange area around 1,000 Hz, where vocals and guitars often become more prominent. The upper midrange is more about clarity and articulation, and the high frequencies about contributing “air” and a sparkly sheen.

It’s interesting to see how EQ has changed over the years, as well as levels (Fig. 3).

Figure 3: Left, top to bottom. Abba (“Dancing Queen”), Led Zeppelin (“Rock and Roll”), Steely Dan (“Josie”), Peter Gabriel (“In Your Eyes”). Right, top to bottom: Madonna (“Ray of Light”), The Lox (“The Interview”), Carl Cox (“Global”), Kassav (“Si’w Pa La”).

Figure 3: Left, top to bottom. Abba (“Dancing Queen”), Led Zeppelin (“Rock and Roll”), Steely Dan (“Josie”), Peter Gabriel (“In Your Eyes”). Right, top to bottom: Madonna (“Ray of Light”), The Lox (“The Interview”), Carl Cox (“Global”), Kassav (“Si’w Pa La”).

The left-hand column is mostly from several decades ago. Abba is all about the midrange and vocals—not surprising—but they add some air in the 8 kHz to 10 kHz range. Led Zeppelin also has some serious midrange, but check out the bass bump—John Paul Jones’s bass really pushes the low end. Note how the lower mids are pulled back to make room for the bass and the rest of the midrange.

Steely Dan was known for their clean sound. This curve shows they resisted the temptation of lots of bass but didn’t shy away from it as much as Abba. The cut around 5 kHz is interesting—that’s usually associated with harshness, so maybe the reduction contributed to their “smooth” sound. This mix has a comparatively even frequency response.

The Gabriel cut was recorded later than his earlier works, and it started showing some more “modern” touches. The bass peak was moved higher in frequency (compared to a song like “Digging in the Dirt”), which made it more suitable for playback systems that lacked deep bass response. The major treble boost around 8 – 10 kHz became more common in later decades to add “air” to the music.

The right-hand column shows some more extreme mixes. In particular, note how the level often exceeds the -20 line. (All these examples were ripped from CDs, without modification.) Madonna’s cut is all levels, all the time—with a noticeable bass peak, a 1 kHz bump to push the voice and melodic instruments, a bit of relief in the harsh frequencies around 4 – 5 kHz, and a bright high end.

The Lox cut is highly representative of rap. There’s a big bass/kick, consistent midrange, and present (not hyped) highs. But when it comes to bass, DJ Carl Cox delivers the huge low end that dance floors crave, along with midrange articulation and plenty of highs. Finally, Kassav is a long-running Caribbean band that leans heavily into dance. Note the strong bass, and again, that “ear candy” high-end air, around 8 kHz.

A lot of what determines EQ is genre-specific. You’re not going to find rap music that’s light on bass or pop music that doesn’t have brightness. Don’t follow the curves generated by PAZ blindly, but analyze multiple curves from music that relates to your genre and look for commonalities.

4. Stereo Placement: The Engineer’s Viewpoint

Why does Madonna’s “Ray of Light” explode out of your speakers, even at low volumes? Why does Peter Gabriel’s “Digging in the Dirt” sound more sedate yet still delivers a huge emotional impact? We need two answers to these questions—the engineer’s and the artist’s.

The engineer can load up the PAZ Stereo Position Meter (SPD), which shows the distribution of energy across the stereo field in a way that’s much more useful than a traditional phase meter (Fig. 4).

Figure 4: Peter Gabriel (left), Madonna (right).

Figure 4: Peter Gabriel (left), Madonna (right).

The lobes on the sides represent the amount of energy in the left and right channels, while the lobe in the middle is the center. A lobe’s height represents the level. Also, note the space between the lobes. With Gabriel’s “Digging in the Dirt,” the left and right are not only quieter than Madonna’s but also more distinct and separated from the center. Clearly, “Ray of Light” pushes a lot of level into the left, right, and center. Gabriel pushes up the center, where you can hear every nuance of his expressive voice.

Let’s look at two more graphs (Fig. 5): Bach’s superb “Brandenburg Concerto No. 5,” and Carl Cox’s “Global,” which remains some of my favorite head-banging EDM.

Figure 5: Bach (left), Carl Cox (right).

Figure 5: Bach (left), Carl Cox (right).

Bach’s position in the stereo field is what you’d expect—well-balanced, with the soloists sufficiently prominent in the center, the sides spread out equally, and no massive level differences among the various nodes. With Carl Cox, the center rules. That extra level in the center lobe has a lot to do with the prominent kick. Nonetheless, there’s plenty to fill out the sides. But note the gap between the center and left/right. This is not too surprising; it’s music for clubs, which lives mostly in a mono world.

Fig. 6 compares The Lox and Led Zeppelin.

Figure 6: The Lox (left), Led Zeppelin (right).

Figure 6: The Lox (left), Led Zeppelin (right).

The Lox graph is a representation of rap: It’s all about the center because that’s where the vocals, the kick and the snare live. The sides are for the ear candy because you don’t want distractions from the message. On the other hand, Led Zeppelin covers the stereo field in a way that’s similar to the Madonna example above but a bit more understated. However, note the level bump in the center. It might as well be labeled “Robert Plant Meets Bonham’s Kick.”

Stereo Placement: The Artist’s Viewpoint

We have the engineer’s answers. But we still don’t know why “Ray of Light” explodes out of the speakers, while Peter Gabriel’s voice in “Digging in the Dirt” is so captivating. To find out the kind of energy that’s distributed across the stereo field, not just the amount, let’s look at the artist’s answer.

Our tool of choice here is the Scheps 73 EQ or the Abbey Road RS56 EQ. Both can monitor the center or the sides of a stereo mix (Fig. 7), so you can hear what was placed in the center and what was panned off to the sides.

Figure 7: The Scheps 73 on the left is showing the sides level for Madonna. The Abbey Road RS56 EQ is showing the sides for Peter Gabriel—note the difference in levels.

Figure 7: The Scheps 73 on the left is showing the sides level for Madonna. The Abbey Road RS56 EQ is showing the sides for Peter Gabriel—note the difference in levels.

Even better, with these plugins, you can hear the center or sides, in mono, by turning one control. This is much more revealing than having the sides in one ear and the center in the other, as you normally have with a mid-side encoder.

The sides for “Ray of Light” are brash synthesizer and guitar parts, mixed as if they were the star of the show. But because they’re placed so far to the sides, they leave plenty of room for vocals, bass, and kick. Meanwhile, the sides for the Gabriel cut are all about putting a frame around the song, mostly using ambiance, with the same purpose as a frame around a painting: aesthetic in its own right, but not the center of attention.

Comparing sides and mids for reference tracks is revealing. With Led Zeppelin’s “Rock and Roll,” a lot of Page’s overdubs are in the sides, as well as Bonham’s high hat and toms. The bass is completely down the center, along with the bulk of the drums and Plant’s voice. For Bob Marley, the sides seem reserved mostly for highly rhythmic, melodic instruments.

Kassav places the reverberation and ambiance from multitracked, call-and-response vocals in the sides, along with ambiance from the instruments, to fill out the rhythm. For many of their cuts, the sides give the feel of being in a live space. With The Lox, the sides were almost always for the ear candy sounds and the secondary rap vocals.

I also compared Peter Gabriel’s “In Your Eyes,” which was recorded several years after “Digging in the Dirt.” Here, there was much more use of the sides, including Jerry Marotta’s toms, as well as “decorative” guitar and keyboard parts.

5. Transients & Additional Fixes

We covered the main aspects of referencing, but there are a few more. Stereo width is primarily a function of panning, but panning takes you only so far—particularly because, increasingly, artists use image enhancement plugins like the S1 Stereo Imager to create wider-than-life stereo. If you need to tweak your stereo imaging, though, the S1 isn’t just about widening. It can also re-balance the right and left channels, as well as increase the sense of spaciousness with bass. Note that although generally thought of as a processor you’d use on masters or full mixes, the S1 can enhance the imaging of individual stereo tracks.

S1 Stereo Imager

S1 Stereo Imager

Another consideration is transient response. Sometimes the difference between a pro studio and a budget setup is that transients can “smear” with budget gear. This can take away some of the excitement of percussive instruments. The Smack Attack plugin can help restore these transients or emphasize ones that already exist. Some people boost the treble to make transients more apparent, but that’s a broader brush and affects the sound more than just boosting transients. You need to be careful not to overdo using a plugin like Smack Attack, but a slight boost to transients, particularly with drums and other percussive instruments, can make a mix come alive.

Smack Attack

Smack Attack

After analyzing these final tweaks, you’ll have a better idea of what went into making a world-class mix sound the way it does, and our reverse engineering is complete. All these elements are like a combination lock: if one number is off, the lock won’t open. But if you can optimize level, dynamics, EQ, stereo placement, imaging and transient response for your music, you’ll be on your way to creating your own world-class mixes.

Musician/author Craig Anderton is an internationally recognized authority on music and technology. He has played on, produced, or mastered over 20 major label recordings and hundreds of tracks, authored 45 books, toured extensively during the 60s, played Carnegie Hall, worked as a studio musician in the 70s, written over a thousand articles, lectured on technology and the arts (in 10 countries, 38 U.S. states, and three languages), and done sound design and consulting work for numerous music industry companies. He is the current President of the MIDI Association.

Want more on loudness and mastering? Get 6 Tips to Maximize Loudness and Dynamic Range in Mastering here!

Want to get more tips straight to your inbox? Subscribe to our newsletter here.