8.2. What is MP3?
The format known as MPEG-1, Layer III (or MP3 for
short) was developed in the late 1980s and early 1990s and was
finalized in November 1992 by the Motion Pictures Expert Group
(MPEG) as part of the original MPEG-1 standard. The MPEG committee is
a gathering of scientists and engineers who work under the auspices
of the International Standards
Organization (ISO) and the
International Electro-Technical
Commission (IEC). The members of the MPEG group are responsible for
establishing standards for digital coding of moving pictures and
audio. (See the sidebar "About the Motion Pictures Expert Group ".)
MP3 is more than a simple compression scheme. Most people are
familiar with file compressors such as zip. But if you've ever tried
to zip up a WAV file, you've probably found that raw audio
doesn't compress well at all. Compression shaves only a tiny
percentage from the original file size. Instead, MP3 gets most of its
compression
from the science of
psychoacoustics
-- the modeling of human auditory perception. The theory is that
uncompressed audio streams carry a lot of data that isn't
actually perceived by humans, for a variety of reasons. The logic
follows: why store data that can't be perceived? MP3 encoders
analyze audio streams and compare them to mathematical models of
human psychoacoustics -- a far more complex and mathematically
intensive process than simple zip compression. The process is time-
and processor-intensive (compared to zip, anyway), but it has the
benefit of achieving more effective compression.
Of course, the act of discarding data results in an imperfect audio
stream by definition. No MP3 file contains all the data found in the
original uncompressed source stream. But in practice, MP3s can be
created with high enough quality to render them indistinguishable
from the source to even the most discerning listener. At mid-level
bitrates (quality levels), MP3 streams can be indistinguishable from
the source to the ears of most people. The trick is in finding the
best possible balance or compromise between file size and quality.
About the Motion Pictures Expert Group
The Motion Pictures Expert Group (MPEG) was established in January
1988 with the mandate to develop standards for coded representation
of moving pictures, audio, and their combination. It operates in the
framework of the Joint ISO/IEC Technical
Committee ( JTC 1) on Information Technology. MPEG is the group that
is responsible for defining the standard we call MP3, as well as
numerous other standards.
Since its first meeting in May 1988 when 25 experts participated,
MPEG has grown to an unusually large committee. Some 350 experts from
200 companies and organizations located in more than 20 countries
typically take part in MPEG meetings. As a rule, MPEG meets three
times a year.
A large part of the membership of the MPEG working group is composed
of individuals operating in research and academia. Even though the
MPEG environment looks rather informal, the group has to bear in mind
standards that can be of high strategic relevance.
MPEG exists to produce
standards.
Published standards are the last stage of a long process that starts
with the proposal of new work within a committee. The journey of the
MPEG-1 Layer III standard that we know as MP3 may have begun in 1989
and become final in November 1992. However, the process of defining
the MPEG family itself began long before that. The group is now
focused on the finalization of MPEG-4, an architecture encompassing
multichannel audio, video, multimedia, dimensional presentations,
security mechanisms, and more.
|
8.2.1. MP3 technical details
As stated earlier, MP3 is a perceptual audio coding scheme. MP3
encoders analyze an audio signal and compare it to psychoacoustic
models representing limitations in human auditory perception. They
then encode as much useful information as possible given the
restrictions set by the
bitrate and
sampling
frequency established in the encoder application. A number of
distinct steps comprise the encoding process, including:
- Minimal audition threshold
-
There are many aspects of audio to which the human ear is
insensitive. For starters, most
audio streams cover a frequency
range much broader than that which humans can hear. The human hearing
range generally falls between 20 Hz and 20 kHz and is most sensitive
between 2 kHz to 4 kHz. As people grow older, their auditory acuity
is diminished -- many people cannot hear tones higher than 16 kHz.
MP3 encoders can immediately discard frequencies above or below this
range. The minimal audition threshold represents the level at which
the human ear will perceive sound. It is not necessary to code
frequencies under or over this threshold, because they won't be
perceived.
- Masking effects
-
When two or more sounds are played simultaneously and one sound is
louder than the other, the louder sound hides or "masks"
the softer one (this concept is also discussed in Chapter 2, "The Science of Sound and Digital Audio").
If
you record both sounds, the softer sound is still present in the
recorded spectrum. However, since the softer sound is masked and
therefore imperceptible, that sound can be safely removed from the
recording. Similarly, if two tones are close together on the
frequency spectrum, they may appear indistinct from one another.
However, if the two tones are sufficiently distinct, they will be
independently perceptible and must both be encoded. File size is cut
dramatically when undetectable (or barely detectable) sounds are
removed from the recording, preserving disk space.
These two effects are called "auditory masking" and
"temporal masking" and may best be
understood by analogy. If you watch a flying bird, its outline may be
distinct against the sky. But if the bird passes in front of the sun,
the sun's brightness completely overpowers the bird's
outline. As the bird moves toward the other edge of the sun, it
becomes visible again. The same principle applies with masking
effects in MP3 encoding.
- Reservoir of bytes
-
MP3 files are stored as a series of
"frames,"
which can be thought of much like the frames that make up a movie.
Each frame carries only a fraction of a second's worth of audio
data and is preceded by a header section describing the bitrate,
encoding method, and other metadata pertaining to the frame to come.
In some cases, a portion of the audio stream may be adequately
encoded with room leftover in its frame. The reservoir of bytes lets
the MP3 encoder "borrow" space from unfilled frames to
store data of adjacent frames that need additional space. The
reservoir of bytes is a sort of space-lending concept that helps to
ensure a consistent flow of data and quality rate.
- Joint stereo
-
While not an essential part of the MP3 encoding process,
joint stereo is an option in most
encoders and is typically enabled by default. When joint stereo is
enabled, stereo sound is represented using a mixture of true stereo
and monophonic sound, along with some spatializing information. Joint
stereo is useful because very low and very high frequencies cannot be
located in space by humans with the same precision as normal
frequencies. The MP3 format exploits this fact by encoding very low
and very high frequencies in mono, thereby saving storage space in
the resultant file. To save additional storage space, try encoding in
mono. To make sure you trap all possible spatial data, encode in
stereo mode. Most users find that joint stereo is adequate for most
purposes.
- Huffman encoding
-
As mentioned earlier,
compressing a WAV file with zip
doesn't shave much off the file size, which is why
psychoacoustics are employed. However, the MP3 encoding process
actually does employ the classic Huffman encoding algorithm. After
all
psychoacoustic
methods have been applied, the Huffman encoding pass seeks out and
compresses any remaining redundancies in the bit pattern. It's
as though zip-type encoding were being run internally on the
psychoacoustically encoded data. While psychoacoustic coding is great
at dealing with polyphonous sections, it's not as efficient
when dealing with highly repetitive, or "pure" sections.
The Huffman pass, on the other hand, is great at handling
redundancies, for the same reason a text file filled with a million
zeros will compress to almost nothing.
The Huffman encoding "pass" is very rapid and allows a
savings of 20% in file size, on average. The Huffman pass therefore
makes a perfect complement to perceptual coding techniques.
A plethora of players
While we cover only a few players in this chapter, there are literally hundreds of MP3 players on the market for virtually every operating system. Some are free; some require a small fee; some are basic and lightweight; others are full-featured and sometimes even bloated. Some work from the command line, while others in a normal window, and still others operate within funky, irregularly shaped interfaces. Check the software libraries for a list of MP3 players for your operating system.
Note that when AOL purchased Nullsoft in late 1999, they made
Winamp freeware. Traditional audio players have acquired MP3 playback
capabilities as well. MP3 playback capabilities have been added to
Liquid Audio's LiquidPlayer (see the sidebar "Liquid Audio: building a viable e-music system" later in this chapter), RealAudio's RealPlayer (and their popular RealJukebox), Apple's QuickTime, and Microsoft Media players.
|
| | | 8. Playing, Serving, and Streaming MP3 | | 8.3. Playing MP3 files |
Copyright © 2002 O'Reilly & Associates. All rights reserved.
|
|