“If a tree falls in the woods and nobody is there to hear it, does it make a sound?”
Sorry, philosophers — I’m afraid it does. Sound waves are constantly bombarding us from everywhere, whether we’re actively listening to them or not. In fact, most of the sound we “hear” gets filtered out by our brains so that we don’t really hear it; otherwise we’d quickly go into perceptual overload. The question was answered most concisely by Albert Einstein, who asked if the moon does not exist if nobody is looking at it.
Other than what goes on in our imagination, most of us think of sound as only what our two ears are capturing for us at any given moment in time. It’s a reasonable way of looking at the world. For example, humans can’t hear the ultra-high frequencies of a dog whistle, so for all intents and purposes, this sound doesn’t meaningfully exist for us; at least until a neighbor’s dog hears it. But that doesn’t mean there isn’t a lot of sonic information occurring while the whistle blows; obviously, there is.
In fact, it doesn’t even mean that your body doesn’t perceive these sounds. Sounds are vibrations moving through a medium; usually, air, though whales and other animals do a good job of talking through water. Our human ears (young, healthy ones) can interpret vibrations as discernible sounds at a rate as low as 20 vibrations per second (Hertz, or Hz), and as high as 20,000 Hz. Vibrations that are slower than 20 per second (infrasonic) or faster than 20,000 (ultrasonic) are simply not interpreted by our brains as an identifiable pitch.
Yet they exist. The lowest notes of a (true) subwoofer are felt through our bones more than they are heard; are these sensations not “sounds”? Similarly, very high sound frequencies that are above our range of hearing, such as the dog whistle, are also perceived by the body as vibrations; they’re felt on our skin and in our hair. Our ears and brains may not be able to interpret these vibrations as pitches, but they do have a physiological effect on our bodies and neural processing, and a good case can be made that they contribute substantially to our perception of sound.
Witness the use of ultrasonic deterrence, used in public spaces and Quickie Mart parking lots to drive away teen loiterers. For teens with young, sensitive ears, the sound was unbearable. To the adults nearby, there was only silence and an absence of skateboarders. The adults didn’t notice the sound (at least not consciously), but there’s no question that it existed.
The scientists that developed perceptual audio coding — the techniques by which full-bodied musical recordings are made into low-res MP3, AAC and other “compressed” formats — devoted years of work to what is “heard” and what isn’t. Their job was to figure out a way to reduce the amount of digital data needed to accurately reproduce a music or media file. The mission statement was to figure out just how much data could be discarded before the listener didn’t “hear” the difference between the original and the reduction.
Back when they were doing this, digital music files needed to be made smaller to be commercially practical for the connected era. A compact disc takes up 600 Mb of digital data. Back when people used dial-up modems to get on the Internet, a download of your favorite album might take all night. With today’s fast broadband, this same process can take less than a minute. Storage for digital music files was similarly unevolved. My first hard drive purchase back in 1991 cost over $300 for a paltry 20Mb of storage. The other day, I bought a terrabyte drive for $75.
The space-saving solutions developed for an earlier age are no longer really needed (or in HRAC’s view, wanted), but they not only persist, they still dominate; at download sites like iTunes and Amazon, and through streaming like Spotify and Pandora. Let’s take a brief look at how these space saving techniques achieve their aim — to the lamentable detriment of music.
Sound As Instant Coffee
The process of shrinking music from its original size to its reduced size is achieved by throwing away musical information that an algorithm believes you can’t hear. While many algorithms are quite clever — a simple example being the convenient auto-fill on your Google searches — others are not quite so smart. Witness what happens when you shop online for shoes on a Monday, buy them on Tuesday, and then get bombarded with algorithm-driven shoe advertisements everywhere online for weeks thereafter. Algorithms are formulas, and while these can represent many things with a good deal of accuracy, they are basically glorified good guesses.
The perceptual coding algorithms that reduce musical recordings to shadows of their original state are based on models of how humans perceive sounds. These algorithms make decisions of which data in a music file can be thrown away to save space, and which ones are the most important to retain. This results in an end product that may be a triumph of science and economy — small file sizes that are fast and cheap! — but clearly a blow against art and emotion. Computer algorithms aren’t nearly as complex, powerful or subtle as human brains, and while economy may be a virtue, poverty is not.
Scientific explanations of how psychoacoustic perceptual coding works abound throughout the Internet. But for the layperson, here are two major culprits that allow MP3 and AAC files to denude your favorite music. Think of them as the brass knuckles and cudgels of lossy music files:
This technique is based on the idea that when your ears hear two sounds that are very close together in pitch, they can’t really distinguish one from the other. Since the algorithm believes you can’t hear them both clearly at the same time, it’s programmed to throw out pitches that it thinks will be missed the least, and assumes the listener won’t really notice or care. Oy.
Think for a moment of all the musical textures that depend on two (or many) instruments playing the same note in different registers. Think of all the doubled lines you hear played in a band or orchestra. Think of the the slide in a blues guitar, or the close harmonies of choral singing. Think of all the tiny sound effects that go into a highly produced hip hop record. In fact, look at a simplest triad chord on a guitar or a piano — three notes with lots of built-in harmonics. Which frequencies should the algorithm throw out? The algorithm decides. You lose.
This concept works along the same lines as frequency masking, only now it’s not about pitch (frequency), but about loudness and time. The idea here is that when a loud sound is heard, the ear and brain focus themselves on that sound. It takes time — microseconds to milliseconds — for the ear to ‘recover’ and interpret the next sound clearly. In essence, this assumes that any musical information that takes place nearby in time to a louder sound is at least in part redundant, and therefore, discardable.
This “smart” decision to discard extremely pertinent musical information is dispiriting, to say the least. It violates the intent of any guitarist that ever slid down the fretboard, or pianist that hid a little ghost note in their chords, or bassist that’s ever played a grace line, or or drummer that subtly shuffled the hi-hat into the next beat. Which pieces of all this musical intent gets thrown away? Lots of them, and the algorithm decides. You lose again.
There’s more to perceptual coding than these two techniques of course. The aforementioned ultrasonic pitches that our skin “hears” are simply thrown out; the infrasonic frequencies that may be the missing link between sex and rock and roll are also eliminated. Tons of audible harmonics are discarded because the algorithm assumes that your ear is focused on the “root tone.” Strike a middle C on a piano and you’ll hear tons of mathematically related harmonics, higher, lower — in addition to that middle C. Run the same middle C through perceptual coding and you’ll hear a clear rendition of the fundamental note, the middle C — with many of said harmonics now missing. Hey, that’s all you really needed anyway, right?
There’s a price we pay for tiny, low-res music files, the kind that were needed in 1999, not now. What’s the cost? Richness, dimensionality, subtlety, color, timing, finesse, drive, space, and emotion, for starters. In other words, all the things that make music more than math.
Smaller is not the same as better. Neither is faster or cheaper. Only better is better. In the case of music, better means the original; what the artist put forward. With hi res audio, we can now listen to the original in the closest way we’ve ever been able to, while still having the convenience that small, cheap and fast have made us accustomed to.
These are challenging times for the music and audio industries, and for music itself. For music lovers, though, hi res audio is an indisputable win.