How Proxying Works (Building Internet Firewalls, 2nd Edition)

9.2. How Proxying Works

The details of how proxying works differ from service to service. Some services provide proxying easily or automatically; for those services, you set up proxying by making configuration changes to normal servers. For most services, however, proxying requires appropriate proxy server software on the server side. On the client side, it needs one of the following:

Proxy-aware application software

With this approach, the software must know how to contact the proxy server instead of the real server when a user makes a request (for example, for FTP or Telnet), and how to tell the proxy server what real server to connect to.

Proxy-aware operating system software

With this approach, the operating system that the client is running on is modified so that IP connections are checked to see if they should be sent to the proxy server. This mechanism usually depends on dynamic runtime linking (the ability to supply libraries when a program is run). This mechanism does not always work and can fail in ways that are not obvious to users.

Proxy-aware user procedures

With this approach, the user uses client software that doesn't understand proxying to talk to the proxy server and tells the proxy server to connect to the real server, instead of telling the client software to talk to the real server directly.

Proxy-aware router

With this approach, nothing on the client's end is modified, but a router intercepts the connection and redirects it to the proxy server or proxies the request. This requires an intelligent router in addition to the proxy software (although the routing and the proxying can co-exist on the same machine).

9.2.1. Using Proxy-Aware Application Software for Proxying

The first approach is to use proxy-aware application software for proxying. There are a few problems associated with this approach, but it is becoming easier as time goes on.

Appropriate proxy-aware application software is often available only for certain platforms. If it's not available for one of your platforms, your users are pretty much out of luck. For example, the Igateway package from Sun (written by Jim Thompson) is a proxy package for FTP and Telnet, but you can use it only on Sun machines because it provides only precompiled Sun binaries. If you're going to use proxy software, you obviously need to choose software that's available for the needed platforms.

Even if software is available for your platforms, it may not be software your users want. For example, dozens of FTP client programs are on the Macintosh. Some of them have really impressive graphical user interfaces. Others have other useful features; for example, they allow you to automate transfers. You're out of luck if the particular client you want to use, for whatever reason, doesn't support your particular proxy server mechanism. In some cases, you may be able to modify clients to support your proxy server, but doing so requires that you have the source code for the client, as well as the tools and the ability to recompile it. Few client programs come with support for any form of proxying.

The happy exception to this rule is web browsers like Netscape, Internet Explorer, and Lynx. Many of these programs support proxies of various sorts (typically SOCKS and HTTP proxying). Most of these programs were written after firewalls and proxy systems had become common on the Internet; recognizing the environment they would be working in, their authors chose to support proxying by design, right from the start.

Using application changes for proxying does not make proxying completely transparent to users. The application software still needs to be configured to use the appropriate proxy server, and to use it only for connections that actually need to be proxied. Most applications provide some way of assisting the user with this problem and partially automating the process, but misconfiguration of proxy software is still one of the most common user problems at sites that use proxies.

In some cases, sites will use the unchanged applications for internal connections and the proxy-aware ones only to make external connections; users need to remember to use the proxy-aware program in order to make external connections. Following procedures they've become accustomed to using elsewhere, or procedures that are written in books, may leave them mystified at apparently intermittent results as internal connections succeed and external ones fail. (Using the proxy-aware applications internally will work, but it can introduce unnecessary dependencies on the proxy server, which is why most sites avoid it.)

9.2.2. Using Proxy-Aware Operating System Software

Instead of changing the application, you can change the environment around it, so that when the application tries to make a connection, the function call is changed to automatically involve the proxy server if appropriate. This allows unmodified applications to be used in a proxied environment.

Exactly how this is implemented varies from operating system to operating system. Where dynamically linked libraries are available, you add a library; where they are not, you have to replace the network drivers, which are a more fundamental part of the operating system.

In either case, there may be problems. If applications do unexpected things, they may go around the proxying or be disrupted by it. All of the following will cause problems:

Statically linked software
Software that provides its own dynamically linked libraries for network functions
Protocols that use embedded port numbers or IP addresses
Software that attempts to do low-level manipulation of connections

Because the proxying is relatively transparent to the user, problems with it are usually going to be mysteries to the user. The user interface for configuring this sort of proxying is also usually designed for the experienced administrator, not the naive user, further confusing the situation.

9.2.3. Using Proxy-Aware User Procedures for Proxying

With the proxy-aware procedure approach, the proxy servers are designed to work with standard client software; however, they require the users of the software to follow custom procedures. The user tells the client to connect to the proxy server and then tells the proxy server which host to connect to. Because few protocols are designed to pass this kind of information, the user needs to remember not only what the name of the proxy server is, but also what special means are used to pass the name of the other host.

How does this work? You need to teach your users specific procedures to follow for each protocol. Let's look at FTP. Imagine that Amalie Jones wants to retrieve a file from an anonymous FTP server (e.g., ftp.greatcircle.com). Here's what she does:

Using any FTP client, she connects to your proxy server (which is probably running on the bastion host -- the gateway to the Internet) instead of directly to the anonymous FTP server.
At the username prompt, in addition to specifying the name she wants to use, Amalie also specifies the name of the real server she wants to connect to. If she wants to access the anonymous FTP server on ftp.greatcircle.com, for example, then instead of simply typing "anonymous" at the prompt generated by the proxy server, she'll type "anonymous@ftp.greatcircle.com".

Just as using proxy-aware software requires some modification of user procedures, using proxy-aware procedures places limitations on which clients you can use. Some clients automatically try to do anonymous FTP; they won't know how to go through the proxy server. Some clients may interfere in simpler ways, for example, by providing a graphical user interface that doesn't allow you to type a username long enough to hold the username and the hostname.

The main problem with using custom procedures, however, is that you have to teach them to your users. If you have a small user base and one that is technically adept, it may not be a problem. However, if you have 10,000 users spread across four continents, it's going to be a problem. On the one side, you have hundreds of books, thousands of magazine articles, and tens of thousands of Usenet news postings, not to mention whatever previous training or experience the users might have had, all of which attempt to teach users the standard way to use basic Internet services like FTP. On the other side is your tiny voice, telling them how to use a procedure that is at odds with all the other information they're getting. On top of that, your users will have to remember the name of your gateway and the details of how to use it. In any organization of a reasonable size, this approach can't be relied upon.

9.2.4. Using a Proxy-Aware Router

With a proxy-aware router, clients attempt to make connections the same way they normally would, but the packets are intercepted and directed to a proxy server instead. In some cases, this is handled by having the proxy server claim to be a router. In others, a separate router looks at packets and decides whether to send them to their destination, drop them, or send them to the proxy server. This is often called hybrid proxying (because it involves working with packets like packet filtering) or transparent proxying (because it's not visible to clients).

A proxy-aware router of some sort (like the one shown in Figure 9-2) is the solution that's easiest for the users; they don't have to configure anything or learn anything. All of the work is done by whatever device is intercepting the packets, and by the administrator who configures it.

Figure 9-2. A proxy-aware router redirecting connections

On the good side, this is the most transparent of the options. In general, it's only noticeable to the user when it doesn't work (or when it does work, but the user is trying to do something that the proxy system does not allow). From the user's point of view, it combines the advantages of packet filtering (you don't have to worry about it, it's automatic) and proxying (the proxy can do caching, for instance).

From the administrator's point of view, it combines the disadvantages of packet filtering with those of proxying:

It's easy for accidents or hostile actions to make connections that don't go through the system.
You need to be able to identify the protocol based on the packets in order to do the redirection, so you can't support protocols that don't work with packet filtering. But you also need to be able to make the actual connection from the proxy server, so you can't support protocols that don't work with proxying.
All internal hosts need to be able to translate all external hostnames into addresses in order to try to connect to them.