The Science of Sound and Digital Audio (Designing Web Audio)

1968: Jimi Hendrix uses feedback to play "The Star Spangled Banner" at Woodstock, taking the electric guitar where it has never gone before.

1978: 100,000 people stand in awe and silence in the sweltering heat of Jamaica's national stadium, struck by the power and emotion of Bob Marley's "RedemptionSong."

1998: The last note of the Bach Cello Suites reverberates throughout a state-of-the-art concert hall, and the audience stands in applause as Yo-Yo Ma takes a final bow with his 1712 Davidoff Stradivarius cello.

2000: Riding the wave of a new music revolution, an up-and-coming singer-songwriter without a record contract builds an enthusiastic fan base by publishing her music on MP3 sites.

These four moments in history demonstrate music's power to transform, excite, and entertain people. They illustrate a technical and artistic mastery of sound production and music-making by highly skilled artists. And they show the dedication of audiences around the world who travel great distances and ignore numerous technical challenges to hear their music.

Musicians spend years practicing the manipulation of sound. And they are always on the lookout for just the right instrument. Fans spend thousands of dollars on music collections, hi-fi stereo equipment, and concert tickets, all in pursuit of a good listening experience. So what does all this boil down to? The scientific phenomena of acoustics and sound wave production.

The difference between the sound an artist produces in a concert hall and the sound produced by the beginning student with a MIDI synthesizer player at home has everything to do with how the quality, tone, richness, and substance of the sound waves reaches our ears. This chapter won't teach you how to become the next Hendrix, but it does examine the scientific phenomena of sound to help you design and produce better web audio.

2.1. The science of sound

Sound is the vibration of air molecules or variation in air pressure that can be sensed by the ear. The pattern and rate of audible vibrations give sound its unique quality. The range of auditory perception is approximately 20 cycles (or fluctuations) of air pressure per second to 20,000 cycles per second. Air pressure fluctuations outside of this range are not audible to the human ear and are called subsonic (less than 20 cycles per second) and ultrasonic (more than 20,000 cycles per second) vibrations.

Sound is generated by a friction-producing force such as a drum stick hitting a cymbal, a bow moving across a cello string, or a vibrating speaker cone that sets the surrounding air molecules into motion. From the point of impact or disturbance, sound waves or patterns of vibrating air molecules radiate outwards through the atmosphere to the ear like the ripples of water on a pond. Figure 2-1 shows a sound produced by a cello as it emanates from a speaker cone.

Figure 2-1. A speaker cone produces sound waves by vibrating surrounding air molecules in fast, rhythmic bursts of energy

As sound waves produced from a cello move rapidly outward through the atmosphere, they reflect off the various surfaces in the room and divide and multiply into thousands of reflections. Once these reflections reach the ear, they are converted into electrical nerve impulses and sent to the brain where they are stored and interpreted as a beautiful "cello" sound. Similarly, these air pressure fluctuations or reflections can be converted into electrical waves or signals with a microphone and sent to a recording device that stores a "waveform" pattern.

The unique pattern of air pressure variations or sound reflections produced by an instrument or a speaker cone are known as waveforms. Figure 2-2 illustrates three different sounds and their unique waveforms. In audio recording and web broadcasting, it is the ability to accurately capture and reproduce these waveforms that results in good sound quality. Improper recording and mastering techniques, poor equipment, and other mistakes may result in a blurred or distorted replication of the original waveform, resulting in poor sound quality.

Figure 2-2. Snapshots of three distinct waveforms, each displaying a unique pattern of air pressure variations or sound reflections

Figure 2-3. A five-millisecond snapshot of harmonic waveforms created by a plucked guitar string and a cymbal crash

Most musical sounds vibrate in orderly repetitive patterns (with the exception of percussive instruments such as rattles and shakers). In contrast, noise produces random, chaotic wave patterns. Figure 2-3 shows these two types of waveforms.

Each type of disturbance in the atmosphere produces a distinct pattern of vibrations. These patterns of vibrating air molecules give a sound its unique quality. The sound is composed of three elements: loudness, pitch, and timbre.

These elements are the three fundamental qualities of sound that scientists, audio professionals, and equipment manufacturers use to understand, measure, and control the audio production process. A clear understanding of how these terms are used will help you master proper recording, editing, and encoding techniques for web audio broadcasting.

2.1.1. Loudness

Loudness, or volume, is the perception of the strength or weakness of a sound wave resulting from the amount of pressure produced. Sound waves with more intensity or larger variations in air pressure produce louder sounds. Sound waves with smaller fluctuations in air pressure produce quieter sounds, as shown in Figure 2-4. Through the duration of the start and decay of most sounds and passages of music, these sound waves fluctuate at various intensities. This is known as the dynamic range of a sound or a passage of music. The dynamic range in an audio file refers to the difference or range in loudness to softness throughout the piece.

Figure 2-4. An example of a "quieter" sound wave with less air pressure versus a "louder" sound wave with more air pressure

Loudness and amplitude

Loudness is not to be confused with amplitude. Loudness refers to the human perception of sound, while amplitude refers to quantifiable measurements of air pressure variations. Amplitude is the change in air pressure over time and is universally measured in decibels (dB).

Waveform intensity affects the blend of sounds in a mix, as shown in Figure 2-5. The louder a sound is, the more it will mask or dominate other sounds in a mix. Masking -- the phenomena of louder sounds overpowering quieter sounds -- is advantageous for an interview with someone at a noisy tradeshow but not for a voice-over with background music.

Figure 2-5. The intensity of waveforms from multiple instruments affects the volume and the perception of a soundtrack creating a masking effect

2.1.2. Pitch

Pitch is the psychoacoustic term for how high or low a sound is perceived by the human ear. Pitch is determined by a sound's frequency, or rate of repetition. Figure 2-6 shows a one-second duration waveform and the frequency rate for two different instruments. Middle C on the piano, for example, vibrates at 261 cycles per second. Frequency is measured in hertz or Hz (also known as cycles per second). The higher the frequency, the higher the pitch. The lower the frequency, the lower the pitch. On home stereo equipment, high-frequency sounds are referred to as treble, and low frequency sounds as bass. Figure 2-7 shows waveforms of relatively lower- and higher-frequency sounds.

Figure 2-6. Two different waveforms and their approximate frequency rate

Notice that it requires more cycles in the same period of time to reproduce a high-pitch sound than it does to reproduce a low-pitch sound, as shown in Figure 2-7. Thus, high-pitch sounds such as a woman's voice or a buzzing fly require more digital information to accurately reproduce than do lower pitched sounds such as a man's voice or a bass guitar. This is why low-pitched sounds are less degraded by the process of converting a sound (encoding) to a low-quality format.

Figure 2-7. Rapid vibrations of air molecules create a high-pitched sound (treble); a slower rate of vibration creates a low-pitched sound (bass).

Harmonics

A given note is comprised of a series of "pitches" that vibrate in harmony with its fundamental frequency or pitch. Musical tones contain many such pitches, known as harmonics. You can experience this phenomena both aurally and visually by listening to and watching a guitar string being plucked. The string will vibrate at a root, or fundamental, frequency, as well as at higher multiples of this frequency. These additional frequencies are the harmonics. For example, a cello note playing the pitch of middle C will predominantly resonate at 261 cycles per second, but it will also contain frequencies vibrating at 1,000, 2,000, and 4,000 cycles per second.

It is also important to note that most sounds are a mixture of waves at various frequencies. A cello note, for example, is composed of many frequencies across the frequency spectrum. The frequency spectrum is the complete range of frequencies we can hear, just as the color spectrum is the range of colors we can see. The term also is used relative to a particular sound, meaning the frequency spectrum is the range of frequencies present in that sound. A mid-range cello note, for instance, has a range of 500 Hz to 12,000 Hz.

The various frequencies that comprise a sound can be amplified or reduced with equalization (EQ) to change the sound's overall tone and character. Equalization for the Web is discussed in detail in Chapter 4, "Optimizing Your Sound Files".

2.1.3. Timbre

Unlike loudness or amplitude, measured in dB, and pitch or frequency, measured in Hz, timbre is difficult to quantify. Timbre is loosely defined as the tone, color, or texture that enables the brain to distinguish one type of instrument sound from another. The term generally encompasses all the qualities of a sound besides loudness and pitch, such as "smooth," "rough," "hollow," "peaceful," "shrill," "warm," and so on. In simple terms, timbre is the sonic difference between a violin and a trumpet playing the same note at the same loudness or amplitude level.

Musical acoustics

For further study, read Donald E. Hall's Musical Acoustics (Wadsworth Publishing Co., 1980). This book gives one of the best in-depth explanations of the science of sound and human hearing. You can also check out more information about sound and recording at the University of California, Santa Cruz Electronic Music Studios' web site, http://arts.ucsc.edu/recording/techinfo.html.

Much of a sound's unique timbre is a result of its particular transient qualities. Transients are the attack and decay, or beginning and ending characteristics, of a sound, as shown in Figure 2-8. For example, the quiver of a violin bow as it strikes the strings or the brief squawk of a saxophone as the air begins to vibrate the reed are transients. Different instruments have unique transients that effect the way we hear a series of notes being played together.

Figure 2-8. Transients, or attack and decay, of a sound

A sound's timbre is derived from two physical phenomena. The first is the acoustic properties of a particular instrument being played or an object being hit or vibrated. The second is the acoustics of the environment in which a sound is produced. A fine cello made with a resonant soundboard, the proper proportion of resin or lacquer coating, and good wood grain contains certain acoustic properties that imbue a sound with richness and purity. In the same way, you can hear the sound being imbued by the acoustics of the concert hall where Yo-Yo Ma is playing that cello.

2.1.4. Sound propagation and acoustics

Sound waves move like ripples of water after a pebble has been dropped on the smooth surface of a pond. As the ripples of water travel outward, they begin to reflect off the surrounding edges of the pond into ever smaller and more complex patterns. In the same manner, sound waves reflect and disperse off various surfaces in our environment such as the walls of a concert hall, as shown in Figure 2-9. But before these sound waves reach our ears, they've already traveled through the air, bouncing off any number of objects. We rarely ever hear the pure direct vibration of a sound wave before it is masked or altered by the coloration of thousands of small reflections.

Figure 2-9. Sound waves from a speaker reflecting off surfaces of a room

In addition to being colored by reflections, sounds are also colored by the material and substances they travel through. For example, a voice projected through a wall sounds different than a voice projected directly into the ear. As sound travels through the dense materials of a wall, for example, the high frequency energy is absorbed into the wood leaving behind only the lower-frequency, muffled version of the sound. A voice spoken directly into the ear, on the other hand, can produce an unbearably loud sound. Without the impediment of a wall or other sound-absorbing materials, the energy of the higher frequencies of the voice travel straight to the eardrum.

Try this simple test. Take a metronome, ticking clock, or radio and set it down in the middle of a room. Listen to how the sound changes as you walk to different locations in the room. Try walking outside the room. Listen to the sound with the door open. Then close the door and listen again. Next, listen to the sound in different environments such as a tile bathroom, outdoors in the open air, or in a large stairwell. As you will observe, changing the environment creates subtle changes in tone quality, equalization, and timbre of a sound.

At first glance, this phenomena may seem rudimentary, but imagine if you had to recreate this effect artificially in the studio, as many sound designers do when producing a film soundtrack.

2.1.5. Reverberation and delay

When we speak inside a large cathedral or stairwell, our voices reflect back and forth off the surfaces of the walls for several seconds, creating a rich sound comprised of thousands of reflections, as shown in Figure 2-9. If a sound reflects off a wall that is close to our ears, we hear the reflections instantly as part of the richness of the original sound wave decay. Rapid reflections that strike our eardrum within 40 milliseconds of the original sound are called reverberation. If a sound bounces off a hard reflective surface that is far away, we hear the reflection as a second distinct sound echo or delay. Figure 2-10 illustrates the difference between a short reverb under 40 milliseconds in length versus a longer reverberation or delay.

Figure 2-10. Left: Reflections that reach the ear after 40ms are perceived as a distinct echo (delay). Right: Reflections that reach the ear within 40ms are perceived as richness and warmth (reverberation).

These reflections blend with the initial direct sound source of a musical note or a voice to create an entirely different auditory experience than the original "dry" sound with no reflections added. The terms " wet" and "dry" are often used to describe sounds with or without reverberation and delay. Wet sounds have lots of reverberation and richness from the thousands of small reflections added to the original soundwave. The more reflections the more dense, or "wet," the sound becomes. A dry sound such as a musical note or a voice produced in a dense forest where sounds are absorbed into the random surfaces of trees and bushes contains little or no reverberation or delay.

Learning how to emulate the reverberation effects produced by the acoustics of real-world environments is crucial for good sound design. A sound designer should be able to artificially recreate any environment or perspective by applying the right effects to an audio clip. Just as imaging professionals use lighting tricks to enhance and manipulate an image, sound designers use effects such as reverb and equalization to enhance a sound or make a soundtrack more realistic. For example, if you are creating a button sound or narration for a web page that resembles a dark cavern, you will need to add the appropriate reverb and background ambiance for that environment.