The risks of eavesdropping affect all Internet protocols, but are of particular concern on the World Wide Web, where sensitive documents and other kinds of information, such as credit card numbers, may be transmitted. There are only two ways to protect information from eavesdropping. The first is to assure that the information travels over a physically secure network (which the Internet is not). The second is to encrypt the information so that it can only be decrypted by the intended recipient.
Another form of eavesdropping that is possible is traffic analysis . In this type of eavesdropping, an attacker learns about the transactions performed by a target, without actually learning the content. As we will see below, the log files kept by Web servers are particularly vulnerable to this type of attack.
It is widely believed that most commercial Web servers that seek to use encryption will use either SSL or SHTTP . Unlike the other options above, both SSL and SHTTP require special software to be running on both the Web server and in the Web browser. This software will likely be in most commercial Web browsers within a few months after this book is published, although it may not be widespread in freely available browsers until the patents on public key cryptography expire within the coming years.
When using an encrypted protocol, your security depends on several issues:
During the summer of 1995, a variety of articles were published describing failures of the encryption system used by the Netscape Navigator. In the first case, a researcher in France was able to break the encryption key used on a single message by using a network of workstations and two supercomputers. The message had been encrypted with the international version of the Netscape Navigator using a 40-bit RC4 key. In the second case, a group of students at the University of California at Berkeley discovered a flaw in the random number generator used by the UNIX -based version of Navigator.  In the third case, the same group of students at Berkeley discovered a way to alter the binaries of the Navigator program as they traveled between the university's NFS server and the client workstation on which the program was actually running.
All of these attacks are highly sophisticated. Nevertheless, most of them do not seem to be having an impact on the commercial development of the Web. It is likely that within a few years the encryption keys will be lengthened to 64 bits. Netscape has improved the process by which its Navigator program seeds its random number generator. The third problem, of binaries being surreptitiously altered, will likely not be a problem for most users with improved software distribution systems that themselves use encryption and digital signatures.
Most Web servers create log files that record considerable information about each request. These log files grow without limit until they are automatically trimmed or until they fill up the computer's hard disk (usually resulting in a loss of service). By examining these files, it is possible to infer a great deal about the people who are using the Web server.
For example, the NCSA httpd server maintains the following log files in its logs/ directory:
By examining the information in these log files, you can create a very comprehensive picture of the people who are accessing a Web server, the information that they are viewing, and where they have previously been.
In Chapter 10, Auditing and Logging , we describe the fields that are stored in the access_log file. Here they are again:
Here are a few lines from an access_log file:
koriel.sun.com - - [06/Dec/1995:18:44:01 -0500] "GET /simson/resume.html HTTP/1.0" 200 8952 lawsta11.oitlabs.unc.edu - - [06/Dec/1995:18:47:14 -0500] "GET /simson/ HTTP/1.0" 200 2749 lawsta11.oitlabs.unc.edu - - [06/Dec/1995:18:47:15 -0500] "GET /icons/back.gif HTTP/1.0" 200 354 lawsta11.oitlabs.unc.edu - - [06/Dec/1995:18:47:16 -0500] "GET /cgi-bin/counter?file=simson HTTP/1.0" 200 545 piweba1y-ext.prodigy.com - - [06/Dec/1995:18:52:30 -0500] "GET /vineyard/history/" HTTP/1.0" 404 -
As you can tell, there is a wealth of information here. Apparently, some user at Sun Microsystems is interested in Simson's resume. A user at the University of North Carolina also seems interested in Simson. And a Prodigy user appears to want information about the history of Martha's Vineyard.
None of these entries in the log file contain the username of the person conducting the Web request. But they may still be used to identify individuals. It's highly likely that koreil.sun.com is a single SPARC station sitting on a single employee's desk at Sun. We don't know what lawsta11.otilabs.unc.edu is, but we suspect that it is a single machine in a library. Many organizations now give distinct IP addresses to individual users for use with their PCs, or for dial-in with PPP or SLIP . Without the use of a proxy server such as SOCKS (see Chapter 22 ) or systems that rewrite IP addresses, Web server log files reveal those addresses.
http://www2.infoseek.com/NS/Titles?qt=unix-hater -> /unix-haters.html http://www.intersex.com/main/ezines.html -> /awa/ http://www.jaxnet.com/~jdcarr/places.html -> /vineyard/ferry.tiny.gif
In the first line, it's clear that a search on the InfoSeek server for the string "unixhater" turned up the Web file /unix-haters .html on the current server. This is an indication that the user was surfing the Web looking for material having to deal with UNIX . In the second line, a person who had been browsing the InterSex WWW server looking for information about electronic magazines followed a link to the /awa/ directory. This possibly indicates that the user is interested in content of a sexually oriented nature. In the last example, apparently a user, firstname.lastname@example.org, has embedded a link in his or her places.html file to a GIF file of the Martha's Vineyard ferry. This indicates that jdcarr is taking advantage of other people's network servers and Internet connections.
By themselves, these references can be amusing. But they can also be potentially important. Lincoln Stein and Bob Bagwill note in the World Wide Web Security FAQ that a referral such as this one could indicate that a person is planning a corporate takeover:
The information in the refer_log can also be combined with information in the access_log to determine the names of the people (or at least their computers) who are following these links. Version 1.5 of the NCSA httpd Web server even has an option to store referrals in the access log.
The access_log file contains the name of the complete URL that is provided. For Web forms that use the GET method instead of the POST method, all of the arguments are included in the URL , and thus all of them end up in the access_log file.
Consider the following:
asy2.vineyard.net - - [06/Oct/1995:19:04:37 -0400] "GET /cgi-bin/vni/useracct?username=bbennett&password=leonlikesfood&cun=jayd&fun=jay+Desmond&pun=766WYRCI&add1=box+634&add2=&city=Gay+Head&state=MA&zip=02535&phone=693-9766&choice=ReallyCreateUser HTTP/1.0" 200 292
Or even this one:
mac.vineyard.net - - [07/Oct/1995:03:04:30 -0400] "GET /cgi-bin/change-password?username=bbennett&oldpassword=dearth&newpassword=flabby3 HTTP/1.0" 200 400
This is one reason why you should use the POST method in preference to the GET method!
NCSA_Mosaic/2.6 (X11;HP-UX A.09.05 9000/720) libwww/2.12 modified via proxy gateway CERN-HTTPD/3.0 libwww/2.17 NCSA_Mosaic/2.6 (X11;HP-UX A.09.05 9000/720) libwww/2.12 modified via proxy gateway CERN-HTTPD/3.0 libwww/2.17 Proxy gateway CERN-HTTPD/3.0 libwww/2.17 Mozilla/1.1N (X11; I; OSF1 V3.0 alpha) via proxy gateway CERN-HTTPD/3.0 libwww/2.17 Mozilla/1.2b1 (Windows; I; 16bit)
In addition to the name of the Web browser that is being used, this file reveals information about the structure of an organization's firewall. It also reveals the use of beta software within an organization.
Some servers allow you to restrict the amount of information that is logged. If you do not require information to be logged, you may wish to suppress the logging.
In summary, users of the Web should be informed that their actions are being monitored. As many firms wish to use this information for marketing, it is quite likely that the amount of information that Web browsers provide to servers will, rather than decrease, increase in the future.