An early morning in August and I take a moment to be quiet and soak up the garden. I am bathed in a gentle morning breeze. Not yet hot, the air is soft and refreshing. Fragrances from flowers and damp earth tell me it is high summer after a rain. My eyes take in the chaotic wonderland of bursting plants presenting a cornucopia of colors and forms. Butterflies and bees animate my view. Cardinals sing, wrens scold, cicadas hum, the leaves in the trees “shhuuush”. Of all the senses that are stimulated, I find that sound is more transportative - either to the original place of the sound or to a heightened sense of the magnificent. If I close my eyes, the symphony of my garden can conjure all the garden has to offer. Likewise, a beautiful song can lift my mood and connect me to something greater. But what exactly is sound? And how can we disentangle the warbles, the scolds, the “zzzz”s, and the swishes, or distinguish Joni Mitchell’s voice from Bob Dylan’s?
Sound itself is actually a fairly straightforward phenomenon. It is a wave of compressions and expansions of the medium through which the sound is traveling. For human hearing, sound is pressure ripples in air. When a bell is rung, it ever so slightly contracts and expands, over and over again. In other words, it vibrates. This causes the air next to the bell to contract and expand, which causes the next parcel of air, further away from the bell, to contract and expand, and so on and so forth. And as the bell continues to vibrate a stream of contractions and expansions flows outward from the bell through the surrounding air - this is a sound wave. Another way to picture a sound wave is to imagine a long slinky that is stretched in a line and pushed along its axis in short pulses. Each pulse travels as a compression in the slinky down the length of the toy. If there are a series of pushes given to the slinky, as in the case of the vibrating bell, there will be a series of compressions (and expansions) traveling down the slinky. If you are watching the slinky, or the air, at one point, the number of compressions that pass that point each second is the frequency of the traveling wave.
It is the frequency of a sound wave that we call pitch. A middle C has a different frequency than a D or an F# or even a higher pitched C. A bell tuned to the note “middle C” vibrates 256 times a second (or 256 Hertz, written as 256 Hz) and will trigger 256 compressions of air to pass by a given point in space every second. A bell that vibrates 6% more times every second, or 271 Hz, is a C sharp bell. When we double the frequency of a note, we jump a full octave. The human ear and brain are actually designed to distinguish sound’s frequencies and we can pick out difference in frequencies as low as 0.2% - much less than that separating adjacent piano keys. How we do this is really quite remarkable.
The ear collects sound waves and converts its variations in air pressure into electrical signals that are sent to the brain. This change, from pressure variations to electrical signal, takes place through 3 transformations within the ear. The outer ear channels sound waves into the ear canal and to the eardrum - a membrane about 1/2 the diameter of a dime. The compression waves of the sound set the eardrum to vibrating - much like a stereo speaker. This is the first transformation of the sound wave - from a pressure wave into a vibrating membrane. The eardrum, in turn, is connected to the three bones of the middle ear. As the eardrum vibrates, it sets the 3 bones pumping as well. The last of the 3 little bones, the stapes or stirrup, is the smallest bone in the human body and as it pumps, it taps on a window to the inner ear. The window is part of a membrane that encloses a fluid. Together the membrane and the fluid make up the cochlea, or the inner ear. The cochlea is a snail shaped vessel, which altogether is about the size of a garden pea. As the stirrup of the middle ear taps on the window of the cochlea, the fluid inside the cochlea begins to vibrate. This is transformation number two - turning the vibrations of the ear drum into vibrations in the fluid of the cochlea. Getting this cochlea fluid to vibrate in response to sound waves is something of a feat - for when sound passes from air to liquid only around 0.1% of the sound’s energy is transferred. Thus we’d be unlikely to hear anything if it weren’t for the mechanics of the middle ear which amplify the pressure in the sound wave by a factor of 17.
The third transformation of the sound waves occurs in the cochlea and it is the most fascinating part of hearing. The cochlea is covered in tiny hair like structures called stereocilia, bundled together into groups of 30 to 300 hair like threads. When these hairs bend, in response to vibrations in the fluid inside the cochlea, pores open. The open pores allow positively charged ions to enter the hair cells. These ions trigger the release of neurotransmitters which send an electric signal to the brain via the auditory nerves.
It’s insane that hearing works at all. To review, the process goes like this: sound is collected, it sets the eardrum vibrating, which pumps the 3 bones of the inner ear, which vibrate the fluid inside the cochlea, causing the hairs on the cochlea to bend, which lets ions into the hairs, which trigger neurotransmitters to send a signal to our brains that we’ve heard something. But it gets even more trippy. The hair bundles on the cochlea only bend to specific frequencies of vibration in the cochlear fluid, with other bundles responding to different frequencies. Our brain knows which frequency tuned bundle sent the “we heard something” signal and thus creates our experience of hearing pitches.
The way in which the hair bundles are tuned to different frequencies is rather clever. Spiraling inward on the cochlea’s coiled snail shape, the stiffness on the surface of the cochlea increases. Hairs attached to the stiffer inner parts of the cochlea, vibrate more slowly than hairs on the more flexible outer part of the cochlea. Stereocillia on the outer cochlea are sensitive to the highest frequencies we can hear while those that sit closer to the center of the spiral respond to lower frequencies. While the musical notes we know and love are separated by 6% in frequency, stereocilia are sensitive to changes in frequency of just 0.2% and thus we can detect tiny changes in frequency.
So this explains why we can distinguish different frequencies, but how do we hear an orchestra? The answer is actually rather simple, though slightly mind bending. Let’s start with a C note from a tuning fork. A tuning fork produces a C sound wave that has a very specific shape. Meaning, if we were to plot the pressure of air versus time, as a sound wave passes, we would get a sine wave curve, with some extra wiggles. The sine wave is shown as the red “Fundamental frequency” in the first panel of the figure below, whereas one of the extra wiggle is plotted as red in the second panel. This extra wiggles arises because the tuning fork also vibrates with frequencies that are multiples of 256 Hz such as 512 Hz and 768 Hz, albeit with smaller intensities. These are the harmonics. The air that carries these various frequencies of vibration responds by superposing all of the vibrations at a given point - just summing them all up. Thus the pressure in air that is transmitting the fundamental 256 Hz wave and the first harmonic at 512 would look like the purple wave in the third panel.
If a wave like this purple were to hit our eardrum, the eardrum would transmit this pattern of vibrations to the middle ear bones, which would transmit the pattern to the fluid in the cochlea. Different stereocilia bundles, distributed along the membrane of the cochlea, would pick up the two different frequencies which are both present within the more complex wave pattern. This in turn would send an electric signal sent to the brain that a 256 Hz sound came in and a less loud 512 Hz signal came in.
The same process of wave superposition occurs when different instruments are played. Striking a C key on a piano will produce a set of harmonics, but which harmonics and how strong they are will depend on the material the piano is made of, its shape, and its age, as well as the weather. Furthermore, the piano “C” will not produce a ‘perfect’ mathematical sine wave as shown in the figure above. Rather, the rise and fall of the pressure wave might be more like a box: rising to a maximum and staying there for part of the cycle and then falling back down for a time, repeating this pattern. In music circles this is called the sound’s envelope and it contributes to the unique sound of every instrument and indeed every voice. And because our ears and our brains can uncouple the different frequencies within the sound, we can distinguish different voices and different instruments.
But sound isn’t just for people. Humans hear sounds with frequencies between 100 Hz and 20,000 Hz. So we refer to sound with frequencies greater than 20,000 Hz as ultrasound. Dogs can hear in the ultrasonic and can detect notes with twice the frequency we can - up to 45,000 Hz. Bats use ultrasound for navigation and hear frequencies up to 120,000 Hz. At the other end of the spectrum, sound below 100 Hz is referred to as infrasound. Elephants use infrasound to communicate with other elephants over long distances. Whales also use infrasound and can communicate over 100s of miles. Oddly, there is a frequency which only teenagers can hear, 17,400 Hz, and stores have used this to send would be loiterers on their way. Teenagers have turned this to their advantage using this frequency as a secret phone ring tone.
While sound can’t travel in a vacuum, it can travel through liquid and solids such as rocks and your skull. Indeed, much of what we know about the interior of the Earth comes from studying sound waves. There are various sources of sound waves in the earth - ranging from earthquakes to landslides to man made explosions. When the waves from an earthquake reaches us, we may feel the buckling of the Earth’s surface. In addition to the short term waves of earthquakes, the Earth also vibrates continuously at very low frequencies. These continuous vibrations of the earth are called ‘normal modes’, which are more generally known as standing waves. The simplest of Earth’s normal modes is the ‘breathing mode’. The whole earth expands and contracts ever so slightly every 20 minutes in a sort of Gaian chant. There is also the ‘rugby mode’ in which the Earth expands along 2 alternating directions with a cycle of 54 minutes. In fact, we’ve detected 1000s of normal modes in the Earth. The Earth also has vibrational patterns that result from the coupling of the solid Earth with the oceans and the atmosphere. For instance, ordinary winds set up microseisms with periods of 1-30 seconds ( or 1 to 0.033 Hz) which are referred to as the Earth’s hum. Now that’s pretty trippy.
Having learned about sound, I understand a bit more why sound is so transportive. Sound operates in the dark, wraps around corners, and travels long distances. When we hear, we experience the whole soundscape. While the sound that arrives at our ears is an integration of all the sounds in the landscape - our vision is directional, requires light and a line of sight. Sound is always present. We also process sounds much faster than vision and therefore sound influence our interpretation of all our other senses. And let’s not forget the transcendent nature of music. Music can transport us to our youth, to certain times of the year, to a traumatic event, or to poignant memories. There is something almost mythical in music. So the next time you meet someone, you may like to ask “What have you been listening to?"
I get the logic, but it still surprises me that the wiggly wave form sounds like two simultaneous but different notes.
Fascinating. Really amazing how our bodies have evolved to work in such complicated ways. It also makes me wonder what breaks down as we age so that some of us become hard of hearing? Is it the fluid dries up, the hairs fall out, or something else?
thanks Pru I always enjoy your articles.