Audiophiles are well aware of the negative sonic effects of lossy audio compression — how constraining the number of bits per second during the sampling process produces distortions from the original. That’s why they insist on listening to digital audio sources that don’t employ lossy compression, such as CDs and hi-res downloads.
For many folks, audio compression is either something they’re okay with, or willing to deal with for convenience’s sake. But those same folks might have second thoughts if they knew just how much distortion is added to audio signals by the compression process — specifically through non-linear quantizing.
Simply put, non-linear quantizing takes advantage of our inability to detect subtle changes in amplitude when listening to loud music. Imagine someone is whispering to you at a low volume in a library. You hear this whispering fine. Now imagine you’re at a loud rock concert and the same person standing next to you whispers at the same volume as in the library. Can you hear the whispering?
Human hearing has a very wide dynamic range: approximately 120 dB, or a ratio of about 1,000,000:1 from the softest sound we can perceive to the loudest sound that does not cause physical pain. But the sensitivity our hearing exhibits at very low sound levels is not maintained at very loud sound levels. That’s where non-linear quantizing comes in.
Let’s look at how engineers use this knowledge when designing audio compression schemes. For our discussion, we will only be addressing a signal’s amplitude, or how loud it is at any given moment. This represents one single sample in sound digitization using traditional sampling techniques, the same kind employed for uncompressed16-bit CD audio and 24-bit high-resolution audio.
For CD audio, 16 bits are used for each sample, yielding a dynamic range of 96 dB. (Every bit of resolution adds 6 dB of dynamic range, so 16 x 6 dB = 96 dB.)
For 24-bit audio, the dynamic range is about 144 dB (24 x 6 dB), which is greater than the dynamic range of human hearing. On this front alone, you can see why 24-bit audio would provide a superior audio experience.
More Bang From Each Bit
Data compression pushes things in the opposite direction: Instead of more bits, fewer bits are used, resulting in a more limited dynamic range. When used aggressively, compression will make music sound “canned,” like it’s all the same volume. (Some pop music is deliberately compressed in the mix to achieve this effect.)
The contribution of non-linear quantization is to create a wider dynamic range by giving the bits different weights from their normal binary values. Consequently, as signal levels get higher, the spacing between increments gets bigger.
Let’s look at an example. Suppose we have a 4-bit audio system. The lowest number is 0000, the highest is 1111, and the normal dynamic range of the system is 16:1 or 24 dB (4 bits x 6 dB). Not very wide at all! However, the assumption that each bit represents its normal value in the binary number system is now going to be thrown out the window. Instead, we will “assign” new values, or weights, to each of these 4-bit combinations, as shown in this chart:
Now, using our non-linear (also known as non-uniform) quantizing scheme, the dynamic range is 36:1. That’s more than double the 16:1 ratio provided by normal quantizing (because 1111 now is assigned a value of 36). With this system, we’re getting better dynamic range than what we’d get from 5-bit audio (32:1 dynamic range) while only using 4 bits.
While non-linear quantizing provides a way to get more dynamic range without using more bits, it also introduces distortions. For example, suppose the actual value that needs to be represented is 15. With our non-linear quantizing system, the available values jump from 14 to 17. There is no way to represent 15; we must round the value up or down to the nearest increment.
Of course, even with linear quantizing the same situation would also be true of 15.5, or 11.074 for that matter. All quantizing entails rounding actual values up or down to the nearest increment allowed by the quantizing system. But with linear quantizing the increments are equally spaced, whereas with non-linear quantizing the increments are spread further apart as signal levels get higher.
In the real world, the use of non-linear quantizing is done much more carefully than this example would suggest, to minimize its effect on audio quality. Nevertheless, there is an impact.
Note that for low-level signals—1, 2, 3, 4, 5, or 6 in this example—distortion related to non-linear quantizing doesn’t occur. (Remember your friend whispering in the library?) But with higher-level signals, the errors become larger (the rock concert). The assumption here is that when things get loud enough, you can’t perceive tiny differences in volume levels, and so greater amounts of error are acceptable.
Non-linear quantizing is a great trick for making a children’s toy talk or giving a phone answering machine better dynamic range, without adding the cost of more bits. For music listening, however, it simply has no place. In the early days of Internet audio, compression was king. That’s why all modern audio compression schemes, including MP3, employ non-linear quantizing. But if you care about eliminating distortion from the music you listen to, non-compressed audio sources are the way to go.
Author’s bio: Cliff Roth, a tech writer, consultant, and teacher, is on the faculty at SUNY Cortland. He was formerly an editor for EETimes, a teacher at The Institute of Audio Research, and he has also written for The New York Times. He can be reached at firstname.lastname@example.org.