HTTP (Building Internet Firewalls, 2nd Edition)

15.3. HTTP

HyperText Transfer Protocol (HTTP) is the protocol that the web is based on. The HTTP protocol itself is a relatively secure protocol and a straightforward one to allow through firewalls.

15.3.1. HTTP Tunneling

HTTP itself is a simple protocol, but it can carry quite complex data. Because HTTP is simple and popular, most people let it through their firewalls. Because it can carry complex data, it's easy to use it to carry other protocols, which can be useful. For instance, as a firewall maintainer, you may prefer having audio data come in over HTTP to having to configure more open ports for audio data (your users may not, since the quality may not be very good).

On the other hand, tunneling can also allow inherently insecure protocols to cross your firewall. For this reason, it may be advantageous to use a firewall solution that does content-based checking of HTTP connections, so that you can disallow connections that are actually tunneling other protocols. This can be quite difficult to do.

Different programs use different methods of "tunneling". These range from simply running their normal protocol on port 80, to including support for HTTP proxying using the "CONNECT" method (discussed later in the section about HTTP proxying), to actually using HTTP with a data type that the client handles specially.

Some of these are much easier to filter out than others. For instance, almost any content checking, whether it's an intelligent packet filter or an HTTP-aware proxy, will get rid of people running protocols other than HTTP on port 80. Similarly, most HTTP proxies will let you control what destinations can be used with CONNECT, and you should restrict them carefully to just the destinations that you need.

Tunneling that actually uses HTTP, on the other hand, is very difficult to filter out successfully. In order to get rid of it, you need to do content filtering on the HTTP stream and remove the relevant data types. Relatively few firewalls support this functionality, and it's very difficult to do successfully in any case. The problem is that if you remove only the data types that you know are being used for tunneling, you are setting up a policy that allows connections by default, which is guaranteed to leave you with a continuous stream of new problems. On the other hand, if you accept only data types that you believe to be safe, you are going to have a continuous stream of new complaints from users, because many data types are in use on the web, and they change rapidly.

Fortunately, the uses for tunneling that actually uses HTTP are fairly limited. The HTTP protocol is set up to support interactions that look like normal web browsing; the client sends a query, and the server sends an answer. The client can't send any information except the initial query, which is of limited size. This model works well for tunneling some other protocols (for instance, it's fine for tunneling RealAudio) but poorly for tunneling protocols that need prolonged interaction between the client and the server. This doesn't prevent people from tunneling any protocol they like over HTTP, but it does at least make it more difficult and less efficient.

There is unfortunately no good solution to the general problem of tunneled protocols. Using proxying to make sure that connections are using HTTP, and controlling the use of CONNECT, will at least limit your exposure.

15.3.2. Special HTTP Servers

We have been discussing web servers, programs that exist purely to provide content via HTTP and related protocols. But HTTP is a straightforward and widely implemented protocol, so a number of things speak HTTP not to provide random content, but for some specialized purpose. The classic example is the administrative interface to normal HTTP servers. If you're administering a web server, you probably have a browser handy, so what's more natural than using the browser to do the administration? For a number of reasons, you don't want the administrative interface built in to the standard server (among other things, common administrative tasks involve stopping and starting the server -- stopping it while you're talking to it is one thing, but starting it again is a neat trick if it's not there to talk to). Therefore, there is often a second server that speaks the HTTP protocol but doesn't behave exactly like a normal web server reading information out of files.

These days, other programs and even hardware devices may provide HTTP interfaces. For instance, you can buy a power strip with a built-in web server, allowing you to turn its outlets on and off from a web browser. These servers do not behave like the servers we have been discussing, and the fact that they speak the HTTP protocol doesn't give you any particularly good idea of what their security vulnerabilities may be.

You will have to assess the security of each of these servers separately. Some of the questions you should ask are:

What information can the web server read? Are there any files that it might unexpectedly reveal?
How are users authenticated?
What can be done to the device via the server?

In general, you do not want to allow connections to these servers to cross a firewall.

15.3.3. Packet Filtering Characteristics of HTTP

HTTP is a TCP-based service. Clients use random ports above 1023. Most servers use port 80, but some don't. To understand why, you need some history.

any information access services (notably HTTP, WAIS, and Gopher) were designed so that the servers don't have to run on a fixed well-known port on all machines. A standard well-known port was established for each of these services, but the clients and servers are all capable of using alternate ports as well. When you reference one of these servers, you can include the port number it's running on (assuming that it's not the standard port for that service) in addition to the name of the machine it's running on. For example, an HTTP URL of the form http://host.domain.example/file.html is assumed to refer to a server on the standard HTTP port (port 80); if the server were on an alternate port (port 8000, for example), the URL would be written http://host.domain.example:8000/file.html.

The protocol designers had two valid reasons for designing these services this way:

Doing so allows a single machine to run multiple servers for multiple data sets. You could, for example, run one HTTP server that's accessible to the world with data that you wish to make available to the public, and another that has other, nonpublic data on a different port that's restricted (via packet filtering or the authentication available in the HTTP server, for example).
Doing so allows users to run their own servers (which may be a blessing or a curse, depending on your particular security policy). Because the standard well-known ports are all in the range below 1024 that's reserved for use only by root on Unix machines, unprivileged users can't run their servers on the standard port numbers.

The ability to provide these services on nonstandard ports has its uses, but it complicates things considerably from a packet filtering point of view. If your users wish to access a server running on a nonstandard port, you have several choices:

You can tell the users they can't do it; this may or may not be acceptable, depending on your environment.
You can add a special exception for that service to your packet filtering setup. This is bad for your users because it means that they first have to recognize the problem and then wait until you've fixed it, and it's bad for you because you'll constantly have to be adding exceptions to the filter list.
You can try to convince the server's owner to move the server to the standard port. While encouraging folks to use the standard ports as much as possible is a good long-term solution, it's not likely to yield immediate results.
You can use some kind of proxied version of the client. This requires setup on your end and may restrict your choice of clients. On the other hand, both Internet Explorer and Netscape Navigator support proxying, and they are by far the most popular clients.
If you can filter on the ACK bit, you can allow all outbound connections, regardless of destination port. This opens up a wide variety of services, including passive-mode FTP. It also is a noticeable increase in your vulnerability.

The good news is that the vast majority of these servers (probably more than 90 percent of them) use the standard port, and the more widely used and important the server is, the more likely it is to use the standard port. Many servers that use nonstandard ports use one of a few easily recognizable substitutes (800 or 8000, for instance).

Some servers also use nonstandard ports to run secondary servers. Traditionally, HTTP proxies use port 8080, and administrative servers use a port number one higher than the server they're controlling (81 for administering a standard web server and 8081 for administering a proxy server).

Your firewall will probably prevent people on your internal network from setting up their own servers at nonstandard ports (you're not going to want to allow inbound connections to arbitrary ports above 1023). You could set up such servers on a bastion host, but wherever possible, it's kinder to other sites to leave your servers on the standard port.

Direction

Source Addr.

Dest. Addr.

Protocol

Source Port

Dest. Port

ACK Set

Notes

Ext

Int

TCP

>1023

80[41]

[42]

Request, external client to internal server

Out

Int

Ext

TCP

80[41]

>1023

Yes

Response, internal server to external client

Out

Int

Ext

TCP

>1023

80[41]

[42]

Request, internal client to external server

Ext

Int

TCP

80[41]

>1023

Yes

Response, external server to internal client

[41]80 is the standard port number for HTTP servers, but some servers run on different port numbers.
[42]ACK is not set on the first packet of this type (establishing connection) but will be set on the rest.

15.3.4. Proxying Characteristics of HTTP

Various HTTP clients (such as Internet Explorer and Netscape Navigator) transparently support various proxying schemes. Some clients support SOCKS; others support user-transparent proxying via special HTTP servers, and some support both. (See the discussion of SOCKS and proxying in general in Chapter 9, "Proxy Systems".)

HTTP proxies of various kinds are extremely common, and many incorporate caching, which can provide significant performance advantages for most sites. (A caching proxy is one that makes a copy of the requested data, so that if somebody else requests the same data, the proxy can fulfill the request with the copy instead of going back to the original server to request the data again.) In addition, many sites are worried about the content that people access via HTTP and use proxies to control accessibility (for instance, to prevent access to sites containing pornography, stock prices, or sports scores, all of which are common nonbusiness uses of the web by employees).

Clients that are speaking to HTTP proxy servers use HTTP, but they use slightly different commands from the ones they'd normally use. A client that wants to get the document known as "http://amusinginformation.example/foodle" without using a proxy will connect to the host amusinginformation.example and send a command much like "GET /foodle HTTP/1.1". In order to use an HTTP proxy, the client will connect to the proxy instead and issue the command as "GET http://amusinginformation.example/foodle HTTP/1.1". The proxy will then connect to amusinginformation.example and send "GET /foodle HTTP/1.1" and return the resulting page to the client.

Some HTTP proxy servers support commands that normal HTTP servers don't support. For instance, they may allow a client to issue commands like "FTP ftp://amusinginformation.example/foodle" (to have the proxy server transfer the named file via FTP and return it to the client) or "CONNECT amusinginformation.example:873" (to have the proxy server make a TCP connection to the named port and relay information between it and the client). There is no standard for these additional commands, although FTP and CONNECT are two of the most common. Most web browsers will support using an HTTP proxy server for FTP and Gopher connections, and common web proxies (for instance, Microsoft Proxy Server) will support FTP and Gopher.

Some clients that are not web browsers will allow you to use an HTTP proxy server for protocols other than HTTP, and most of them depend on using CONNECT, which makes the HTTP proxy server into a generic proxy. For instance, Lotus Notes and rsync clients both are able to use HTTP proxies to get to their servers via CONNECT.

Using an HTTP proxy server as a generic proxy in this way is convenient but not particularly secure. Few HTTP proxy servers provide any interesting control or logging on the protocols used with CONNECT. You will want to be very restrictive about what protocols you allow this way.

It's extremely important to prevent external users from connecting to your HTTP proxy servers. If your HTTP proxy server can make inbound connections, external users can use it as a platform to attack internal servers they would not otherwise be able to get to (this is particularly dangerous if they can use CONNECT to get to arbitrary services). Even if the proxy server can't be used this way, it can be used to attack third parties.

People often search actively for open HTTP proxy servers. Some of these people are hostile and want to use the proxy servers as attack platforms, but some of them just want to use the proxy servers to access web sites that would otherwise be unavailable to them because of filtering rules at their site (or in a few cases, filtering imposed by national governments). Either way, it's probably not to your advantage to let them use your site. Being nice to people behind restrictive filters is tempting, but in the long run, it will merely use up your bandwidth and get you added to the list of filtered sites.

15.3.5. Network Address Translation Characteristics of HTTP

HTTP does not use embedded IP addresses as a functional part of the protocol, so network address translation will not interfere with HTTP. Web pages may contain URLs written with IP addresses instead of hostnames, and those embedded IP addresses will not be translated. You should therefore be careful about the content of web pages on servers behind network address translators.

In addition, HTTP clients may provide name and/or IP address information to servers, leaking information about your internal numbering and naming schemes. HTTP clients may provide "From:" headers, telling the server the user's email address (as the user told it to the browser), and proxies may add "Via:" headers indicating the IP addresses of proxies that a request (or response) has passed through.

15.3.6. Securing HTTP

You may hear discussions about secure versions of HTTP and wonder how they relate to firewalls and the configuring of services. Such discussions are mainly focused on the privacy issues of passing information around via HTTP. They don't really help solve the kinds of problems we've been discussing in previous sections.

Two defined protocols actually provide privacy using encryption and strong authentication for HTTP. The one that everyone knows is usually called HTTPS and is denoted by using https in the URL. The other, almost unknown protocol, is called Secure HTTP and is denoted by using shttp in the URL.

The goal of HTTPS is to protect your communication channel when retrieving or sending data. HTTPS currently uses TLS and SSL to achieve this. Chapter 14, "Intermediary Protocols", contains more technical information on TLS and SSL.

The goal of Secure HTTP is to protect individual objects rather than the communications channel. This allows, for example, individual pages on a web server to be digitally signed -- a web client can check the signature when the page is downloaded. If someone replaces the page without re-signing, then the signature check will fail, causing an alert to be displayed. Similarly, a secure form that is submitted to a web server can be a self-contained digitally signed object. This means that the object can be stored and used later to prove or dispute the transaction.

The use of Secure HTTP could have significant advantages for the consumer in the world of electronic commerce. If a company claims that it has a digitally signed object indicating your desire to purchase 2,000 rubber chickens but the digital signature doesn't match, then you can argue that you did not make the request. If the signature does match, then it can only mean one of two things; either you requested the chickens, or your private key has been stolen. In contrast, when you use HTTPS, your identity is not bound to the transaction but to the communication channel. This means that HTTPS cannot protect you from someone switching your order for rubber chickens to live ones, once it has been made, or just ordering chickens on your behalf.

15.3.6.1. Packet filtering characteristics of HTTPS and Secure HTTP

HTTPS uses a single TCP connection at port 443. Secure HTTP is designed to operate over port 80 (see the section on HTTP).

Direction

Source Addr.

Dest. Addr.

Protocol

Source Port

Dest. Port

ACK Set

Notes

Ext

Int

TCP

>1023

443

[43]

Request, external client to internal server

Out

Int

Ext

TCP

443

>1023

Yes

Response, internal server to external client

Out

Int

Ext

TCP

>1023

443

[43]

Request, internal client to external server

Ext

Int

TCP

443

>1023

Yes

Response, external server to internal client

[43]ACK is not set on the first packet of this type (establishing connection) but will be set on the rest.

15.3.6.2. Proxying characteristics of HTTPS and Secure HTTP

Because HTTPS and Secure HTTP use straightforward TCP connections, they are quite easy to proxy. Most programs that provide HTTP proxying also provide HTTPS and Secure HTTP proxying. However, both HTTPS and Secure HTTP use end-to-end encryption between the client and the server. This means that the data stream is entirely opaque to the proxy system, which cannot do any of the filtering or caching functions it does on normal HTTP connections.

Proxying for HTTPS is normally done using the CONNECT primitive (discussed earlier in the section on proxying HTTP). This allows the real client to exchange certificate information with the server, but it also serves as a generic proxy for any protocol running on the ports that the proxy allows for HTTPS. Since HTTPS is encrypted, the proxy can't do any verification on the contents of the connection. You should be cautious about the ports that you allow for HTTPS.

15.3.6.3. Network address translation characteristics of HTTPS and Secure HTTP

Like HTTP, HTTPS and Secure HTTP have no embedded IP addresses and will work without problems through a network address translation system. Because of the end-to-end encryption, it will not be possible to correct any IP addresses that occur in the body of secured pages, so you should make sure that such pages use hostnames and not IP addresses.

15.3.7. Summary of Recommendations for HTTP

If you're going to run an HTTP server, use a dedicated bastion host if possible.
If you're going to run an HTTP server, carefully configure the HTTP server to control what it has access to; in particular, watch out for ways that someone could upload a program to the system somehow (via mail or FTP, for example) and then trick the HTTP server into executing it.
Carefully control the external programs your HTTP server can access.
You can't allow internal hosts to access all HTTP servers without allowing them to access all TCP ports because some HTTP servers use nonstandard port numbers. If you don't mind allowing your users access to all TCP ports, you can use packet filtering to examine the ACK bit to allow outgoing connections to those ports (but not incoming connections from those ports). If you do mind, then either restrict your users to servers on the standard port (80), or use some form of proxying.
Proxying HTTP is easy, and a caching proxy server offers network bandwidth benefits as well as security benefits.
Do not allow external connections to HTTP proxy servers.
Configure your HTTP clients carefully and warn your users not to reconfigure them based on external advice.