The existence of the World Wide Web ( WWW ) is a major factor behind the recent explosive growth of the Internet. Since the introduction of the NCSA Mosaic package (the first graphical user interface to the WWW to gain widespread acceptance) in 1993, WWW traffic on the Internet has been growing at an explosive rate, far faster than any other kind of traffic (e.g., SMTP email, FTP file transfers, Telnet remote terminal sessions, etc.). You will certainly want to let your users use a browser to access WWW sites, and you will very likely to want to run a site yourself, if you do anything that might benefit from publicity.
Most WWW browsers are capable of using protocols other than HTTP , which is the basic protocol of the Web. For example, these browsers are usually also Gopher and FTP clients, or are capable of using your existing Telnet and FTP clients transparently (without its being visible to the user that he is starting an external program). Many of them are also NNTP , SMTP , and Archie clients. They use a single, consistent notation called a Uniform Resource Locator ( URL ) (see sidebar) to specify connections of various types.
Many of the modern information access services (notably HTTP , WAIS , and Gopher) were designed so that the servers don't have to run on a fixed well-known port on all machines. A standard well-known port was established for each of these services, but the clients and servers are all capable of using alternate ports as well. When you reference one of these servers, you can include the port number it's running on (assuming that it's not the standard port for that service) in addition to the name of the machine it's running on. For example, an HTTP URL of the form http://host.domain.net/file.html is assumed to refer to a server on the standard HTTP port (port 80); if the server were on an alternate port (port 8000, for example), the URL would be written http://host.domain.net:8000/file.html .
The protocol designers had two good and valid reasons for designing these services this way:
The ability to provide these services on nonstandard ports has its uses, but it complicates things considerably from a packet filtering point of view. If your users wish to access a server running on a nonstandard port, you have several choices:
The good news is that the vast majority of these servers (probably much greater than 90%) use the standard port, and that the more widely used and important the server is, the more likely it is to use the standard port. Many servers that use nonstandard ports use one of a few easily recognizable substitutes (81, 800, 8000, and 8080).
Your firewall will probably prevent people on your internal network from setting up their own servers at nonstandard ports (you're not going to want to allow inbound connection to arbitrary ports above 1023). You could set up such servers on a bastion host, but wherever possible, it's kinder to other sites to leave your servers on the standard port.
Various HTTP clients (such as Mosaic and Netscape Navigator) transparently support various proxying schemes. Some clients support SOCKS ; others support user-transparent proxying via special HTTP servers, and some support both. (See the discussion of SOCKS and proxying in general in Chapter 7 .)
The CERN HTTP server, developed at the European Particle Physics Laboratory in Geneva, Switzerland, has a proxy mode in which the server handles all requests for remote documents from browsers inside the firewall. The server makes the remote connection, passing the information back to the clients transparently. See Appendix B for information about getting the CERN HTTP server.
Using the CERN HTTP server as a proxy server can provide an additional benefit, because the server can locally cache WWW pages obtained from the Internet. This caching can significantly improve client performance and reduce network bandwidth requirements. It does this by ensuring that popular WWW pages are retrieved only once at your site. The second and subsequent requests get the locally cached copy of the page, rather than a new copy each time from the original server out on the Internet.
The TIS FWTK also includes an HTTP proxy server, called http-gw , that can be used with any client program. Clients that support HTTP proxying can use the FWTK HTTP proxy server transparently (all you have to do is configure the client to tell it where the server is), but you must enforce custom user procedures for clients that don't support HTTP proxying. Basically, URL s have to be modified to direct the clients to the proxy server rather than the real server. URL s embedded in HTML documents that pass through the server are modified automatically, but users must know how to do it by hand for URL s they type in from scratch or obtain through other channels. Chapter 7 describes the TIS FWTK in more detail.
The following sections describe these concerns:
In most ways, the security concerns we have for an HTTP server are very similar to the security concerns we have for any other server that handles connections from the Internet, e.g., an anonymous FTP server. You want to make sure that the users of those connections can access only what you want them to access, and that they can't trick your server so they get to something they shouldn't.
There are a variety of methods to accomplish these goals, including:
HTTP servers themselves are providing a limited service and don't pose major security concerns. However, there is one unique feature of HTTP servers that you need to worry about: their use of external programs, particularly ones that interact with the user via the Common Gateway Interface ( CGI ) which is the piece of HTTP that specifies how user information is communicated to the server and from it to external programs. Many HTTP servers are configured to run other programs to generate HTML pages on the fly. These programs are generically called CGI scripts, even if they don't use CGI and aren't scripts. For example, if someone issues a database query to an HTTP server, the HTTP server runs an external program to perform the query and generate an HTML page with the answers.
There are two things you need to worry about with these external programs:
You may want to run your HTTP server on a Macintosh, DOS , or Windows machine. These machines have good HTTP server implementations available, but don't generally have the other capabilities that would make those servers insecure. For example, they are unlikely to be running other servers, they don't have a powerful and easily available scripting facility, and they're less likely to have other data or trusted access to other machines. The downside of this is that it's hard to do interesting things on them; the easier it gets, the less secure they'll be.Tricking external programs
The external programs run by HTTP servers are often shell scripts written by folks who have information they want to provide access to, but who know little or nothing about writing secure shell scripts (which is by no means trivial, even for an expert).
Because it's difficult to ensure the security of the scripts themselves, about the best you can do is try to provide a secure environment (using chroot and other mechanisms) that the scripts can run in (one which, you hope, they can't get out of). There should be nothing in the environment you'd worry about being revealed to the world. Nothing should trust the machine the server is running on. If you set up the environment in this way, then even if attackers somehow manage to break out of the restricted environment and gain full access to the machine, they're not much further along towards breaking into the really interesting stuff on your internal network.
Alternatively, or in addition, if you have people who you feel sure are capable of writing secure scripts, you can have all the scripts written, or at least reviewed, by these people. Most sites don't have people like this readily available, but if you are going to be seriously involved in providing WWW service, you may want to hire one. It's still a good idea to run the scripts in a restricted environment; nobody's perfect.Uploading external programs
In this case, the attacker might be able to upload his own script or binary to that writable directory using anonymous FTP , and then cause the HTTP server to run it.
What is your defense against things like this? Once again, your best bet is to restrict what filesystem areas each server can access (generally using chroot ), and to provide a restricted environment in which each server can run.
The security problems of HTTP clients are far more complex that those of HTTP servers. The basis of these client problems is that HTTP clients (like Mosaic and Netscape Navigator) are generally designed to be extensible and to run particular external programs to deal with particular data types. This extensibility can be abused by an attacker.
HTTP servers can provide data in any number of formats: plain text files, HTML files, PostScript documents, still video files ( GIF and JPEG ), movie files ( MPEG ), audio files, and so on. The servers use MIME , discussed briefly above in the section on electronic mail, to format the data and specify its type. HTTP clients generally don't attempt to understand and process all of these different data formats. They understand a few (such as HTML , plain text, and GIF ), and they rely on external programs to deal with the rest. These external programs will display, play, preview, print, or do whatever is appropriate for the format.
For example, UNIX Web browsers confronted with a PostScript file will ordinarily invoke the GhostScript program, and UNIX Web browsers confronted with a JPEG file will ordinarily invoke the xv program. The user controls (generally via a configuration file) what data types the HTTP client knows about, which programs to invoke for which data types, and what arguments to pass to those programs. If the user hasn't provided his own configuration file, the HTTP client generally uses a built-in default or a systemwide default.
All of these external programs present two security concerns:
Let's consider, for example, what an HTTP client is going to do with a PostScript file. PostScript is a language for controlling printers. While primarily intended for that purpose, it is a full programming language, complete with data structures, flow of control operators, and file input/output operators. These operators ("read file", "write file", "create file", "delete file", etc.) are seldom used, except on printers with local disks for font storage, but they're there as part of the language. PostScript previewers (such as GhostScript) generally implement these operators for completeness.
Suppose that a user uses Mosaic to pull down a PostScript document. Mosaic invokes GhostScript, and it turns out that the document has PostScript commands in it that say "delete all files in the current directory." If GhostScript executes the commands, who's to blame? You can't really expect Mosaic to scan the PostScript on the way through to see if it's dangerous; that's an impossible problem. You can't really expect GhostScript not to do what it's told in valid PostScript code. You can't really expect your users not to download PostScript code, or to scan it themselves.
Current versions of GhostScript have a safer mode they run in by default. This mode disables "dangerous" operators such as those for file input/output. But what about all the other PostScript interpreters or previewers? And what about the applications to handle all the other data types? How safe are they? Who knows?
Even if you have safe versions of these auxiliary applications, how do you keep your users from changing their configuration files to add new applications, run different applications, or pass different arguments (for example, to disable the safer mode of GhostScript) to the existing applications?
Why would a user do this? Suppose that the user found something in the WWW that claimed to be something he really wanted - a game demo, a graphics file, a copy of Madonna's new song, whatever. And, suppose that this desirable something came with a note that said "Hey, before you can access this Really Cool Thing, you need to modify your Mosaic configuration, because the standard configuration doesn't know how to deal with this thing; here's what you do..." And, suppose that the instructions were something like "remove the `-dSAFER' flag from the `ghostscript' line of your .mosaicrc file," or "add this line to your .mosaicrc file."
Would your users recognize that they were being instructed to disable the safer mode in GhostScript, or to add some new data type with /bin/sh as its auxiliary program, so that whatever data of that type came down was passed as commands straight to the shell? Even if they recognized it, would they do it anyway (nice, trusting people that they are)?
Some people believe that Macintosh and PC -based versions of WWW browsers are less susceptible to some of these security problems than UNIX -based browsers. On Mac and PC machines, there is usually no shell (or only a shell of limited power, like the MS-DOS command interpreter) that an attacker can break out to, and a limited and highly unpredictable set of programs to access once they're there. Also, if any damage occurs, it can often be more easily isolated to a single machine. On the other hand, "highly unpredictable" does not mean "completely unpredictable". (For example, a very large percentage of Macs and PC s have copies of standard Microsoft applications, like Word and Excel.) Further, if your Macs and PC s are networked with AppleShare, Novell, PC-NFS , or something similar, you can't make any assumptions about damage being limited to a single machine.What can you do?
There is no simple, foolproof defense against the type of problem we've described. At this point in time, you have to rely on a combination of carefully installed and configured client and auxiliary programs, and a healthy dose of user education and awareness training. This is an area of active research and development, and both the safeguards and the attacks will probably develop significantly over the next couple of years.
Because Mac and PC clients seem less susceptible to some of the client-side problems, some sites take the approach of allowing WWW access only from Macs or PCs. Some go even further and limit access to particular machines (often placed in easily accessible locations like libraries or cafeterias) that have been carefully configured so they have no sensitive information on them, and no access to such information. The idea is this: If anything bad happens, it will affect only this one easily rebuilt machine. The machine can't be used to access company data on other machines.
Some people have experimented, at least in UNIX environments, with running Mosaic and its auxiliary programs under the X Window System in a restricted environment - or on a "sacrificial goat" machine that has nothing else on it - with the displays directed to their workstation. This provides a certain measure of protection, but it also imposes a certain amount of inconvenience. Consider the following problems with this approach:
As discussed above in the section called "Packet Filtering Characteristics of HTTP ," there is another complication of WWW clients in environments in which packet filtering is part of the firewall solution: not all HTTP servers run on port 80. To address this, you might consider using proxy servers for HTTP access. If you do this, the internal clients talk on standard ports through the packet filtering system to the proxy server, and the proxy server talks on arbitrary ports (because it's outside the packet filtering system) to the real server.
You may hear discussions of Secure HTTP and wonder how it relates to firewalls and the configuring of services. Secure HTTP is not designed to solve the kinds of problems we've been discussing in this section. It's designed to deal with privacy issues by encrypting the information that is being passed around via HTTP . A mechanism like Secure HTTP is necessary to be able to do business using HTTP so that things like credit card numbers can be passed over the Internet without fear of capture by packet sniffers. In order to distinguish between privacy issues, on the one hand, and vulnerability to malicious servers, on the other hand, people working on HTTP and similar extensible protocols usually use the word "safe" to refer to protocols that protect you from hostile servers, and the word "secure" to refer to protocols that protect you from data snooping.
Because it provides authentication as well as encryption, Secure HTTP could eventually provide some assistance with safety. If you are willing to connect only to sites that you know, that run Secure HTTP , and that authenticate themselves, you can be sure that you're not talking to a hostile site. However, even when Secure HTTP is released and in wide usage, this approach (limited connections) is unlikely to be a popular and practical one; part of the glory of the Web is being able to go to new and unexpected places.
Although people are working on HTTP -like protocols that are safe, safe HTTP is probably not a viable concept. It's not HTTP that's unsafe; it's the fact that HTTP is transferring programs in other languages. This is a major design feature of HTTP and one of the things responsible for its rapid spread.