home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Building Internet Firewalls

Building Internet FirewallsSearch this book
Previous: 2.4 Usenet News Chapter 2
Internet Services
Next: 2.6 Other Information Services
 

2.5 The World Wide Web

Mail, FTP , Telnet, and Usenet news have been around since the early days of the Internet; they're actually extensions of services provided well before the Internet even existed. The World Wide Web ( WWW ) is a new, entirely Internet-based concept, based in part on existing services, and in part on a new protocol, HyperText Transfer Protocol ( HTTP ).

Many people confuse the functions and origins of WWW , Mosaic, and HTTP , and the terminology used to refer to these three distinct entities has become muddy. Some of the muddiness was introduced intentionally; Web browsers attempt to provide a seamless interface to a wide variety of information through a wide variety of mechanisms, and blurring the distinctions makes it easier to use, if more difficult to comprehend. Here is a quick summary of what the individual entities are about.

The WWW is the collection of HTTP servers (see the description of HTTP below) on the Internet. The Web is responsible, in large part, for the recent explosion in Internet activity. WWW is based on concepts developed at the European Particle Physics Laboratory ( CERN ) in Geneva, Switzerland, by Tim Berners-Lee and others. Much of the groundbreaking work on Web clients was done at the National Center for Supercomputing Applications ( NCSA ) at the University of Illinois in Urbana-Champaign. There are many organizations and individuals developing Web client and server software these days, and many more using these technologies for a huge range of purposes. Nobody "controls" the Web, however, much as nobody "controls" the Internet.

The Web uses hypertext technology to link together a web of documents, which may include text, graphics images, sound files, video files, and other formats. The documents can be traversed in any way - not only hierarchically - to search for information. Hypertext provides the navigation from one document to another on the Internet. Users can move freely from one to another, regardless of where the documents are located, by simply clicking on a word or picture for which an HTTP link has been defined.

HTTP is the primary application protocol that underlies the World Wide Web: it provides users access to the files that make up the Web. As mentioned above, these files might be in many different formats (text, graphics, audio, video, etc.), but the most common format on the Web is the HyperText Markup Language ( HTML ). HTML is a standardized page description language for creating Web pages. It provides basic document-formatting capabilities (including the ability to include graphics), and allows you to specify hypertext links to other servers and files.

Mosaic, developed by Marc Andreessen and others at NCSA , is an HTTP client, which is used as a browser of the Web. NCSA Mosaic is free and runs on Windows, the Macintosh, and many different flavors of UNIX . There are many other web browsers available and as this book goes to press, the most popular is Netscape Navigator, a commercial product that is available free for nonprofit and educational use or for commercial evaluation. It also runs on Windows, the Macintosh, and various UNIX machines. (Other Web browsers include Lynx, Viola WWW , perl WWW , and Midas WWW .) HTTP is but one protocol spoken by Mosaic; Mosaic clients typically also speak at least the FTP , Gopher, and WAIS protocols. Netscape Navigator speaks all of those and also NNTP and SMTP . Thus, when users say "we want Mosaic" or "we want Netscape," what they really mean, from a protocol level, is that they want access to the HTTP servers that make up the WWW , as well as the associated Gopher, WAIS , and FTP servers (plus, for Netscape, NNTP and SMTP servers).

Web browsers are fantastically popular, and for good reason. They provide a rich, graphical interface to an immense number of Internet resources. Information and services that were unavailable or expert-only before are now easily accessible. In Silicon Valley, you can use the Web to have dinner delivered without leaving your computer except to answer the door. It's hard to get a feel for the Web without experiencing it; it covers the full range of everything you can do with a computer, from the mundane to the sublime with a major side trip into the ridiculous.

Unfortunately, Web browsers and servers are hard to secure. The usefulness of the Web is in large part based on its flexibility, but that flexibility makes control difficult. Just as it's easier to transfer and execute the right program from a Web browser than from FTP, it's easier to transfer and execute a malicious one. Web browsers depend on external programs, generically called "viewers" (even if they play sounds instead of showing pictures), to deal with data types that the browsers themselves don't understand. (The browsers generally understand basic data types such as HTML , plain text, and JPEG and GIF graphics.) You should be very careful about which viewers you configure by default; you don't want a viewer that can do dangerous things because it's going to be running on your computers, as if it were one of your users, taking commands from an external source. You also want to warn users not to add viewers, or change viewer configurations, based on advice from strangers.

Because an HTML document can easily link to documents on other servers, it's easy for people to become confused about exactly who is responsible for a given document. New users may not notice when they go from internal documents at your site to external ones. This has two unfortunate consequences. First, they may trust external documents inappropriately (because they think they're internal documents.) Second, they may blame the internal Web maintainers for the sins of the world. People who understand the Web tend to find this hard to believe, but it's a common misconception: the dark side of having a very smooth transition between sites.

Most Web servers are reasonably secure, as shipped. However, they can also call external programs. These programs are relatively easy to write, but very difficult to secure. You should treat server-side extensions with the same caution you would treat a new server of any kind.

(We discuss the security of Web clients and servers in more detail in Chapter 8 .)