Introduction to Streaming Media (Designing Web Audio)

Internet streaming media changed the Web as we knew it -- changed it from a static text- and graphics-based medium into a multimedia experience populated by sound and moving pictures. Now streaming media is poised to become the de facto global media broadcasting and distribution standard, incorporating all other media, including television, radio, and film. The low cost, convenience, worldwide reach, and technical simplicity of using one global communications standard makes web broadcasting irresistible to media publishers, broadcasters, corporations, and individuals. Businesses and individuals once denied access to such powerful means of communication are now using the Web to connect with people all over the world.

The remarkable technology that allows a web site visitor to click on a button and seconds later listen to a sporting event, tradeshow keynote, or CD-quality music is the result of a rather simple but powerful technical innovation -- streaming media. Streaming works by first compressing a digital audio file and then breaking it into small packets, which are sent, one after another, over the Internet. When the packets reach their destination (the requesting user), they are decompressed and reassembled into a form that can be played by the user's system. To maintain the illusion of seamless play, the packets are "buffered" so a number of them are downloaded to the user's machine before playback. As those buffered or preloaded packets play, more packets are being downloaded and queued up for playback. However, when the stream of packets gets too slow (due to network congestion), the client audio player has nothing to play, and you get the all-too-familiar drop-out that every user has encountered.

5.1. Streaming protocols

The big breakthrough that enabled the streaming revolution was the adoption of a new Internet protocol called the User Datagram Protocol (UDP)and new encoding techniques that compressed audio files into extremely small packets of data. UDP made streaming media feasible by transmitting data more efficiently than previous protocols from the host server over the Internet to the client player or end listener. More recent protocols such as the RealTime Streaming Protocol (RTSP) are making the transmission of data even more efficient.

UDP and RTSP are ideal for audio broadcasting since they place a high priority on continuous streaming rather than on absolute document security. Unlike TCP and HTTP transmission, when a UDP audio packet drops out, the server keeps sending information, causing only a brief glitch instead of a huge gap of silence. TCP, on the other hand, keeps trying to resend the lost packet before sending anything further, causing greater delays and breakups in the audio broadcast.

Prior to UDP and RTSP transmission, data was sent over the Web primarily via TCP and HTTP. TCP transmission, in contrast to UDP and RTSP transmission, is designed to reliably transfer text documents, email, and HTML web pages over the Internet while enforcing maximum reliability and data integrity rather than timeliness. Since HTTP transmission is based on TCP, it is also not well-suited for transmitting multimedia presentations that rely on time-based operation or for large-scale broadcasting.

Later in the chapter, you will learn why protocols are important. Some streaming technologies such as RealAudio and Windows Media utilize dedicated servers that support superior UDP and RTSP transmission. Other formats such as Shockwave, Flash, MIDI, QuickTime, and Beatnik are primarily designed to stream from a standard HTTP web server. While these formats are cheaper and often easier to use since they do not require the installation of a new server, they are typically not used in professional broadcasting situations that require the delivery of hundreds or thousands of simultaneous streams.

HTTP streaming is thus referred to as pseudo-streaming, since technically it is possible to stream via HTTP. But it is much more likely to cause major packet drop-outs, and it cannot deliver nearly the same amount of streams as UDP and RTSP transmission. Herein lies the difference between most low-end solutions and more professional broadcasting solutions that require dedicated servers and extra bandwidth and server capacity.

5.1.1. Lossy compression

Regardless of the advances in UDP and RTSP transmission protocols, streaming media would not be possible without the rapid innovation in encoding algorithms or codecs that compress and decompress audio and video data. Uncompressed audio files are huge. One minute of playback of a CD-quality stereo audio file requires 10 MB of data, approximately enough disk space to capture a small library of books or a 200-page web site.

Standard modem speed connections -- including cable modems and xDSL systems -- do not have the capacity to deliver pure, uncompressed CD-quality 16-bit, 44.1 kHz audio. In order to stream across the limited bandwidth of the Web, audio has to be compressed and optimized with codecs, which are compression-decompression encoding algorithms. In general, compression schemes can be classified as "lossy" and "lossless."

Lossy compression schemes reduce file size by discarding some amount of data during the encoding process before it is sent over the Internet. Once received on the client side, the codec attempts to reconstruct the information that was lost or discarded. The benefit to this sort of compression lies in the smaller file size that results from discarding the "lost" information. The JPEG image format uses lossy compression to sample an image and discard unnecessary color information. Similarly, lossy audio compression discards frequencies on the high and low end of the spectrum and attempts to locate and remove unnecessary audio data. The technique is often referred to as "perceptual encoding" since the user is unlikely to notice the absence of this information. Lossy compression offers file savings on the order of 10:1.

Since small file size is so important on the Internet, practically all of the formats we're interested in employ lossy compression. Here's how it works. First, the client player decompresses the audio file as it downloads to your computer. Then it fills in the missing information according to the instructions set by the codec. To illustrate why lossy compression is so crucial, consider the phrase, "Now is the time for all good men to come to the aid of their country". One way to compress this would simply be to remove all the vowels and spaces: "Nwsthtmfrllgdmntcmtthdfthrcntry".

That cuts the message from 71 characters to 31, a 56% file savings, but of course our compressed message is unintelligible. Imagine that our codec, however, has appropriate rules for decompressing this message with minimal distortion. The conversion likely wouldn't be perfect, but it would be good enough to understand the message, something like, "Now's tha ti'm for oll gudm en to com to the aad of their country".

This is exactly what happens with lossy audio compression. The compressed file is unintelligible to the listener; the decompressed file is intelligible but of a lower quality than the original.

For example, a RealAudio speech file encoded from a standard AIFF or WAV file is generally one-tenth the size of the original file after encoding. To reduce that file's size, first you preserve the integrity of the 1,000 Hz to 4,000 Hz frequency spectrum of the human voice and then discard the frequencies above and below those ranges. By eliminating the unnecessary low- and high-end frequencies, the encoder is able to reduce the file size while maintaining speech intelligibility. It should be noted that speech tends to have aural characteristics (sound) that extend into the 7,000 Hz range. When the area between 4,000 Hz and 7,000 Hz is reduced or removed entirely, encoded speech will sound intelligible, but it may lose clarity and sound unnatural. Furthermore, since some voices and sounds often reach into even higher frequency ranges, lossy compression and encoding can result in dull, muted, or abrasive sounds.

5.1.2. Lossless compression

In contrast, lossless compression squeezes data into smaller packets of information without permanently discarding any of the data. Instead of permanently discarding information, lossless compression discards it temporarily but provides a "map" with which the codec can reconstruct the original file. Lossless compression results in superior audio quality, but lower compression rates.

In the lossy example, our codec had some general rules for reconstructing the message -- basically to add vowels and spaces in order to form English words. It wasn't perfect because it didn't know which English words to choose, and it wasn't always sure where one word ended and the next began.

Lossless codecs, on the other hand, are perfect. To reconstruct our message perfectly, however, would mean having a much more sophisticated set of rules. A lossless text codec would have to reproduce not only words but sensible phrases. It would have to be able to break words correctly. And it would have to have a mastery of the English language's inconsistent spelling patterns. It would in fact be, as the computer scientists say, a nontrivial endeavor.

The same goes for lossless audio codecs. They are difficult to develop (and thus expensive to license), they require substantial computing power on the user's machine, and the file savings are not as great as with lossy compression. Sadly enough, it appears that for the current time, lossy compression is necessary for knocking large audio files down to Internet-appropriate size. The good news is that lossy compression schemes are becoming more advanced, and over time the differences will become less and less noticeable to the human ear.

Now that we have discussed lossy and lossless compression and the types of protocols that enable the efficient delivery of compact audio files across the Internet, let's review the audio formats available on the market. Most of these formats will be discussed in greater detail in the rest of the book.

Chapter 5. Introduction to Streaming Media

Contents:

5.1. Streaming protocols

5.1.1. Lossy compression

5.1.2. Lossless compression