Chapter 2. The Science of Sound and Digital Audio
1968: Jimi Hendrix uses feedback to play "The Star Spangled Banner" at Woodstock, taking the electric guitar where it has never gone before.
1978: 100,000 people stand in awe and silence in the sweltering heat
of Jamaica's national stadium, struck by the power and emotion
of Bob Marley's "RedemptionSong."
1998: The last note of the Bach Cello Suites reverberates throughout
a state-of-the-art concert hall, and the audience stands in applause
as Yo-Yo Ma takes a final bow with his 1712 Davidoff Stradivarius
cello.
2000: Riding the wave of a new music revolution, an up-and-coming
singer-songwriter without a record contract builds an enthusiastic
fan base by publishing her music on MP3 sites.
These four moments in history demonstrate music's power to
transform, excite, and entertain people. They illustrate a technical
and artistic mastery of sound production and music-making by highly
skilled artists. And they show the dedication of audiences around the
world who travel great distances and ignore numerous technical
challenges to hear their music.
Musicians spend years practicing the manipulation of sound. And they
are always on the lookout for just the right instrument. Fans spend
thousands of dollars on music collections, hi-fi stereo equipment,
and concert tickets, all in pursuit of a good listening experience.
So what does all this boil down to? The scientific phenomena of
acoustics and sound wave production.
The difference between the sound an artist produces in a concert hall
and the sound produced by the beginning student with a MIDI
synthesizer player at home has everything to do with how the quality,
tone, richness, and substance of the sound waves reaches our ears.
This chapter won't teach you how to become the next Hendrix,
but it does examine the scientific phenomena of sound to help you
design and produce better web audio.
Psychophysical versus acoustic audio terms
Sound is often measured and described using two distinct reference points: the psychophysical perception of sound in the human ear and brain, and the scientific quantification of sound with acoustic measuring devices. Psychophysical terms such as loudness and pitch describe the human perception of the same acoustical phenomena as amplitude and frequency. For example, amplitude is the measurement of the range of air pressure quantified in decibels or dB. Loudness is the human perception of the range of air pressure in a waveform. Frequency is the measurement of the rate of repetition of a waveform. Pitch is the perception of how high or low a waveform sounds. Many audio equipment manuals and professionals use these terms interchangeably, which can create confusion for newcomers.
|
2.1. The science of sound
Sound is the
vibration of air
molecules or variation in air pressure that can be sensed by the ear.
The pattern and rate of audible vibrations give sound its unique
quality. The range of auditory perception is approximately 20 cycles
(or fluctuations) of air pressure per second to 20,000 cycles per
second. Air pressure fluctuations outside of this range are not
audible to the human ear and are called subsonic (less than 20 cycles per second)
and ultrasonic (more than 20,000 cycles per
second) vibrations.
Sound is generated
by
a friction-producing force such as a drum stick hitting a cymbal, a
bow moving across a cello string, or a vibrating speaker cone that
sets the surrounding air molecules into motion. From the point of
impact or disturbance, sound waves or patterns of vibrating air
molecules radiate outwards through the atmosphere to the ear like the
ripples of water on a pond. Figure 2-1 shows a
sound produced by a cello as it emanates from a speaker cone.
Figure 2-1. A speaker cone produces sound waves by vibrating surrounding air molecules in fast, rhythmic bursts of energy
As sound waves produced from a cello move rapidly outward through the
atmosphere, they reflect off the various surfaces in the room and
divide and multiply into thousands of reflections. Once these
reflections reach the ear, they are converted into electrical nerve
impulses and sent to the brain where they are stored and interpreted
as a beautiful "cello" sound. Similarly, these air
pressure fluctuations or reflections can be converted into electrical
waves or signals with a microphone and sent to a recording device
that stores a "waveform" pattern.
The unique pattern of air pressure variations or sound reflections
produced by an instrument or a speaker cone are known
as
waveforms. Figure 2-2
illustrates three different sounds and their unique waveforms. In
audio recording and web broadcasting, it is the ability to accurately
capture and reproduce these waveforms that results in good sound
quality. Improper recording and mastering techniques, poor equipment,
and other mistakes may result in a blurred or distorted replication
of the original waveform, resulting in poor sound quality.
Figure 2-2. Snapshots of three distinct waveforms, each displaying a unique pattern of air pressure variations or sound reflections
Figure 2-3. A five-millisecond snapshot of harmonic waveforms created by a plucked guitar string and a cymbal crash
Most musical sounds vibrate in orderly repetitive patterns (with the
exception of percussive instruments such as rattles and shakers). In
contrast,
noise
produces random, chaotic wave patterns. Figure 2-3 shows these two types of waveforms.
Each type of disturbance in the atmosphere produces a distinct
pattern of vibrations. These patterns of vibrating air molecules give
a sound its unique quality. The sound is composed of three
elements: loudness,
pitch, and
timbre.
These elements are the three fundamental qualities of sound that
scientists, audio professionals, and equipment manufacturers use to
understand, measure, and control the audio production process. A
clear understanding of how these terms are used will help you master
proper recording, editing, and encoding techniques for web audio
broadcasting.
2.1.2. Pitch
Pitch is the
psychoacoustic
term for how high or low a sound is perceived by the human ear. Pitch
is determined by a sound's frequency, or
rate
of
repetition. Figure 2-6 shows a one-second duration
waveform and the frequency rate for two different instruments. Middle
C on the piano, for example, vibrates at 261 cycles per second.
Frequency is measured in
hertz
or Hz (also known as cycles per second). The higher the frequency,
the higher the pitch. The lower the frequency, the lower the pitch.
On home stereo equipment, high-frequency sounds are referred to as
treble, and low
frequency sounds as
bass. Figure 2-7 shows waveforms of relatively lower- and
higher-frequency sounds.
Figure 2-6. Two different waveforms and their approximate frequency rate
Notice that it requires more cycles in the same period of time to
reproduce a high-pitch sound than it does to reproduce a low-pitch
sound, as shown in Figure 2-7. Thus, high-pitch
sounds such as a woman's voice or a buzzing fly require more
digital information to accurately reproduce than do lower pitched
sounds such as a man's voice or a bass guitar. This is why
low-pitched sounds are less degraded by the process of converting a
sound (encoding) to a low-quality format.
Figure 2-7. Rapid vibrations of air molecules create a high-pitched sound (treble); a slower rate of vibration creates a low-pitched sound (bass).
Harmonics
A given note is comprised of a series of "pitches" that vibrate in harmony with its fundamental frequency or pitch. Musical tones contain many such pitches, known as harmonics. You can experience this phenomena both aurally and visually by listening to and watching a guitar string being plucked. The string will vibrate at a root, or fundamental, frequency, as well as at higher multiples of this frequency. These additional frequencies are the harmonics. For example, a cello note playing the pitch of middle C will predominantly resonate at 261 cycles per second, but it will also contain frequencies vibrating at 1,000, 2,000, and 4,000 cycles per second.
|
It is also important to note that most sounds are a mixture of waves
at various frequencies. A cello note, for example, is composed of
many frequencies across the frequency spectrum.
The
frequency
spectrum is the complete range of frequencies we can hear, just as
the color spectrum is the range of colors we can see. The term also
is used relative to a particular sound, meaning the frequency
spectrum is the range of frequencies present in that sound. A
mid-range cello note, for instance, has a range of 500 Hz to 12,000
Hz.
The various frequencies that comprise a sound can be amplified or
reduced with equalization (EQ) to change the
sound's overall tone and character. Equalization for the Web is
discussed in detail in Chapter 4, "Optimizing Your Sound Files".
2.1.3. Timbre
Unlike loudness or amplitude, measured in dB, and pitch or frequency,
measured in Hz, timbre is difficult to quantify.
Timbre is loosely defined as the tone, color, or
texture that enables the brain to distinguish one type of instrument
sound from another. The term generally encompasses all the qualities
of a sound besides loudness and pitch, such as "smooth,"
"rough," "hollow," "peaceful,"
"shrill," "warm," and so on. In simple terms,
timbre is the sonic
difference between a violin and a trumpet playing the same note at
the same loudness or amplitude level.
Musical acoustics
For further study, read Donald E. Hall's Musical Acoustics (Wadsworth Publishing Co., 1980). This book gives one of the best in-depth explanations of the science of sound and human hearing.
You can also check out more information about sound and recording at the University of California, Santa Cruz Electronic Music Studios' web site, http://arts.ucsc.edu/recording/techinfo.html.
|
Much of a sound's unique timbre is a result of its particular
transient
qualities. Transients are the attack and decay,
or beginning and ending characteristics, of a sound, as shown in
Figure 2-8. For
example, the quiver of a violin bow as it strikes the strings or the
brief squawk of a saxophone as the air begins to vibrate the reed are
transients. Different instruments have unique transients that effect
the way we hear a series of notes being played together.
Figure 2-8. Transients, or attack and decay, of a sound
A sound's timbre is derived from two physical phenomena. The
first is the acoustic properties of a particular instrument being
played or an object being hit or vibrated. The second is the
acoustics of the environment in which a sound is produced. A fine
cello made with a resonant soundboard, the proper proportion of resin
or lacquer coating, and good wood grain contains certain acoustic
properties that imbue a sound with richness and purity. In the same
way, you can hear the sound being imbued by the acoustics of the
concert hall where Yo-Yo Ma is playing that cello.
2.1.4. Sound propagation and acoustics
Sound waves move
like ripples of water after a pebble has been dropped on the smooth
surface of a pond. As the ripples of water travel outward, they begin
to reflect off the surrounding edges of the pond into ever smaller
and more complex patterns. In the same manner, sound waves reflect
and disperse off various surfaces in our environment such as the
walls of a concert hall, as shown in Figure 2-9.
But before these sound waves reach our ears, they've already
traveled through the air, bouncing off any number of objects. We
rarely ever hear the pure direct vibration of a sound wave before it
is masked or altered by the coloration of thousands of small
reflections.
Figure 2-9. Sound waves from a speaker reflecting off surfaces of a room
In addition to being colored by
reflections,
sounds are also colored by the material and substances they travel
through. For example, a voice projected through a wall sounds
different than a voice projected directly into the ear. As sound
travels through the dense materials of a wall, for example, the high
frequency energy is absorbed into the wood leaving behind only the
lower-frequency, muffled version of the sound. A voice spoken
directly into the ear, on the other hand, can produce an unbearably
loud sound. Without the impediment of a wall or other sound-absorbing
materials, the energy of the higher frequencies of the voice travel
straight to the eardrum.
Try this simple test. Take a metronome, ticking clock, or radio and
set it down in the middle of a room. Listen to how the sound changes
as you walk to different locations in the room. Try walking outside
the room. Listen to the sound with the door open. Then close the door
and listen again. Next, listen to the sound in different environments
such as a tile bathroom, outdoors in the open air, or in a large
stairwell. As you will observe, changing the environment creates
subtle changes in tone quality, equalization, and timbre of a sound.
At first glance, this phenomena may seem rudimentary, but imagine if
you had to recreate this effect artificially in the studio, as many
sound designers do when producing a film soundtrack.
2.1.5. Reverberation and delay
When we speak inside a large cathedral or stairwell, our voices
reflect back and forth off the surfaces of the walls for several
seconds, creating a rich sound comprised of thousands of reflections,
as shown in Figure 2-9. If a sound reflects off a
wall that is close to our ears, we hear the reflections instantly as
part of the richness of the original sound wave decay. Rapid
reflections that strike our eardrum within 40 milliseconds of the
original sound are called reverberation. If a
sound bounces off a hard reflective surface that is far away, we hear
the reflection as a second distinct sound echo or
delay. Figure 2-10 illustrates
the difference between a short reverb under 40 milliseconds in length
versus a longer
reverberation
or
delay.
Figure 2-10. Left: Reflections that reach the ear after 40ms are perceived as a distinct echo (delay). Right: Reflections that reach the ear within 40ms are perceived as richness and warmth (reverberation).
These reflections blend with the initial direct sound source of a
musical note or a voice to create an entirely different auditory
experience than the original "dry" sound with no
reflections added. The terms "
wet" and
"dry"
are often used to describe sounds with or without reverberation and
delay. Wet sounds have lots of reverberation and
richness from the thousands of small reflections added to the
original soundwave. The more reflections the more dense, or
"wet," the sound becomes. A dry
sound such as a musical note or a voice produced in a dense forest
where sounds are absorbed into the random surfaces of trees and
bushes contains little or no reverberation or delay.
Learning how to emulate the reverberation effects produced by the
acoustics of real-world environments is crucial for good sound
design. A sound designer should be able to artificially recreate any
environment or perspective by applying the right effects to an audio
clip. Just as imaging professionals use lighting tricks to enhance
and manipulate an image, sound designers use effects such as reverb
and equalization to enhance a sound or make a soundtrack more
realistic. For example, if you are creating a button sound or
narration for a web page that resembles a dark cavern, you will need
to add the appropriate reverb and background ambiance for that
environment.
 |  |  | 1.10. Summary |  | 2.2. Digital audio demystified |
Copyright © 2002 O'Reilly & Associates. All rights reserved.
|