Using a vocal de-esser, music producers reduce the harsh “Ess” sounds (AKA sibilance, usually between 4kHz and 10kHz) that can build up over the course of recording or production. De-essing vocals is a task usually done towards the end of working on a vocal effects chain, similar to smoothing out the rough edges on a finished piece of cut wood.
Like compression and other less overt audio effects, it takes a trained ear to identify the presence of sibilance, and to decide how much to reduce it. There’s also such a thing as too little sibilance – without enough “Ess” sounds, the voice can have a lispy quality.
Getting de-essing right takes time, comparison of your work to others’, and referencing your work on as many speakers and headphones as you can.
In this article, we’ll discuss de-esser settings, where to put your de-esser in your vocal chain, and give you specific step-by-step instructions for de-essing a vocal recording for professional results.
TL;DR - How to De-Ess Vocals
Read on for a full, step-by-step explanation, but here’s a summary of the de-essing process if this is all you need:
Understanding Sibilance and Why It Happens
Sibilance refers to the harsh higher frequencies within the ‘S’ sound, but can also describe ‘Ts’ and ‘Zs’. Sibilance is a natural byproduct of speech. It isn’t a mistake in and of itself, but a feature of the human voice. In fact, one reason for recorded sibilance is that fact that a singer can sing so close to a microphone. If someone sang right next to your ear, it would be harsh as well.
Some people have more sibilant speech than others, and therefore require more attention when positioning the microphone in relation to their mouth (the closer and more on-axis the mic, the more sibilance). Microphone types such as a cardioid condenser, will also be more sensitive to these higher frequencies, resulting in more sibilance.
Unlike the unwanted audio byproducts of plosives (consonants which block air such as p, t, b, d, k, and g, which can be prevented by using a pop shield), sibilance can be made a problem by the use of compression or EQ within a vocal chain.
What De-Essing Does to a Vocal
De-essing a vocal means to reduce, or in extreme cases remove, sibilance from a microphone signal or recording – often the frequencies between 4kHz and 10kHz. The vocal becomes less harsh, and more realistic sounding.
The process should leave the rest of the vocal sounding as intact as possible, and should only be applied to problematic areas of sibilance to retain the bright, natural sound of the voice.
Manual Approaches to De-Essing Vocals
Manual methods of de-essing a vocal can be as simple as riding the fader or automating the track's volume in moments of sibilance, which, depending on the amount of problematic sibilance in your recording, can turn into a massive task.
Instead of reducing the overall level of the whole voice track when sibilance occurs, how about reducing the frequencies using an EQ? A static EQ cut in the high frequencies will dull sibilant frequencies – but it will do so even while sibilance isn’t a problem. A dynamic EQ allows you to set a band that only reduces frequencies when their build-up passes a certain threshold. This is far closer to what we want.
While these techniques have their place, the ideal solution is custom-built: a de-esser plugin is tailor-made to reduce sibilance and to leave the rest of the signal as it should be. Next we’ll show you exactly how to do it yourself.
How to De-Ess Vocals with a Plugin
In this example we’ll be using Waves Sibilance, which unlike a lot of de-esser plugins automatically identifies and targets problematic sibilance. Other de-esser plugins may require you to identify the frequency range of the sibilance.
Setting the Threshold
Start with your threshold and range at 0dB, and you should notice that no change occurs to your sound. However, at the top of the plugin is a visual representation of your waveform, with the green areas highlighting moments where it has identified sibilance within your recording.
By lowering the threshold, you’ll notice two turquoise lines moving toward the center of the waveform. These indicate the threshold level. Lower the threshold until these lines overlap with the green sibilance indicators on the waveform.
You don’t necessarily need to lower it so all instances of sibilance are peaking above the threshold – the intention is to only reduce the moments that actually sound harsh. Of course, while these visual aids help, use your ears to find the problematic moments and set the threshold accordingly.
Setting the Range
The Range control determines the maximum amount of gain reduction which occurs when the signal crosses the threshold. The amount by which the signal is being reduced is visible from the yellow line on the Range scale. To decide how much reduction you need, it can oftentimes be helpful to turn up the Range quite high and then reduce it until the result sounds natural to you.
If it sounds natural but you’re unsure if the de-esser is doing anything, this is likely a good sign. If you bypass the de-esser and the issue returns, then it shows you have found the right amount of reduction. If the signal doesn’t sound any different, you may not need to de-ess as much as you thought, or may only need to target particularly loud moments of sibilance.
Detection Width, Mode, and Monitor
Detection Width affects what the plugin identifies as problematic sibilance, with the lower values detecting narrow frequency ranges (Ss), and the higher value detecting wider frequency ranges (Shh). If set too low, the sibilance may not be caught; too high and other sounds within the signal may be affected. Detected problem sounds appear as yellow on the graph.
The Mode control affects how gain reduction is applied to the detected sibilance, with its lowest position affecting the ‘Shh’ sounds and the highest position targeting the ‘Ss’ sounds. This is because as the mode increases it focuses less on the entire targeted signal range, and more on frequencies above 4kHz. It is usually best to set it somewhere in the middle so that it targets both ranges.
The Monitor button allows for you to hear what is being removed from the signal, so when balancing the above controls switching between what you are removing and the resulting sound will give you a better idea of what you are doing.
How De-Essing Affects Vocal Tone and Presence
Overly de-essing a signal can have a strange psychoacoustic effect, which is why it’s often suggested that a light touch is preferable to carving out any and all sibilance. The reason for this is simple: we’ve evolved to hear the human voice above all other things, and have been hearing it all our life. So unnaturally changing the balance of frequencies within singing or speech can create the audio equivalent of an uncanny valley effect.
Presence isn’t just the mid to low frequency content within a voice, but how the low, mid, and high frequencies are balanced together. More often than not, it’s the higher frequency content within a sound which gives us the psychological cue that a sound is close, as lower frequencies travel further through air than higher ones.
A well de-essed vocal will still sound bright and present, you should be unaware that de-essing has even taken place. You are shaping the tone of the higher frequencies within a voice, so think in terms of making them sound crisp and clear, present but not pokey. If the vocal tone is becoming muted and you are noticing its absence you have gone too far. If you are becoming unsure how far to take it in either direction, have an ear break. The worst way de-essing can affect tone and presence is when it is applied too hastily with burnt out ears.
Practical Tips for Using Waves DeEsser Plugins
Frequently Asked Questions (FAQs)
Why Do Vocals Sound Dull After De-Essing?
Dullness occurs when too much of the high-frequency content is being removed, which can be due to the reduction amount being too high or affecting too wide a range of the higher register of the voice. This results in the voice sounding muted and unnatural, or in extreme cases like the vocalist has an artificial lisp.
Where to Put De-Esser in Vocal Chain?
Typically a de-esser should be put late in the vocal chain, certainly after EQ and compression. Both processes can be the source of the problem, and can accentuate sibilance in an otherwise balanced recording.
What’s the Difference Between a De-Esser and Dynamic EQ?
A de-esser could be described as similar to a dynamic EQ, and you can use a dynamic EQ as a de-esser if push comes to shove. The main difference is that a de-esser has one frequency band aimed at targeting areas where sibilance is an issue, whereas a dynamic EQ on the other hand has multiple frequency bands including but not limited to this frequency range.
What Matters More in De-Essing, Smoothness or Clarity?
It depends on the context and what vocal sound you’re aiming for. A pop song might prioritize smoothness to keep the volume levels of the voice very uniform, allowing for the vocals to be loud and upfront. Dialogue in a film often aims for clarity with natural sounding dynamics, therefore clarity is more important than uniformity.
Conclusion
De-essing is a vital process for smoothing out vocal channels in a mix, and leads to a more professional vocal sound when done right. De-essers like Waves Sibilance are ideal tools for reducing any sibilance that’s inherent in a recording, or that has been introduced by heavy-handed compression or EQ. A de-esser should be used with care, to not introduce a lisping effect into a vocal recording.