home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Apache The Definitive Guide, 3rd EditionApache: The Definitive GuideSearch this book

9.5. Setup

The cache directory for the proxy server has to be set up rather carefully with owner webuser and group webgroup, since it will be accessed by that insignificant person (see Chapter 2).

You now have to tell your browser that you are going to be accessing the Web via a proxy. For example, in Netscape click on Edit Figure Preferences Figure Advanced Figure Proxies tab Figure Manual Proxy Configuration. Click on View,and in the HTTP box enter the IP address of our proxy, which is on the same network, 192.168.123, as our copy of Netscape:

192.168.123.4

Enter 8000 in the Port box.

For Microsoft Internet Explorer, select View Figure Options Figure Connection tab, check the Proxy Server checkbox, then click the Settings button, and set up the HTTP proxy as described previously. That is all there is to setting up a real proxy server.

You might want to set up a simulation to watch it in action, as we did, before you do the real thing. However, it is not that easy to simulate a proxy server on one desktop, and when we have simulated it, the elements play different roles from those they have supported in demonstrations so far. We end up with four elements:

  • Netscape running on a Windows 95 machine. Normally this is a person out there on the Web trying to get at our sales site; now, it simulates a Butterthlies member trying to get out.

  • An imaginary firewall.

  • A copy of Apache (site: ... /site.proxy/proxy) running on the FreeBSD machine as a proxy server to the Butterthlies site.

  • Another copy of Apache, also running on FreeBSD (site: ... /site.proxy/real ) that simulates another web site "out there" that we are trying to access. We have to imagine that the illimitable wastes of the Web separate it from us.

The configuration in ... /site.proxy/proxy is as shown earlier. Since the proxy server is running on a machine notionally on the other side of the Web from the machine running ... /site.proxy/real, we need to put it on another port, traditionally 8000.

The configuration file in ... /proxy/real is:

User webuser
Group webgroup
ServerName www.faraway.com

Listen www.faraway.com:80
DocumentRoot /usr/www/APACHE3/site.proxy/real/htdocs

On this site, we use the more compendious Listen with the server name and port number combined.

Normally www.faraway.com would be a site out on the Web. In our case we dummied it up on the same machine.

In ... /site.proxy/real/htdocs there is a file containing the message:

I am a web site far, far out there.

Also in /etc/hosts there is an entry:

192.168.124.1 www.faraway.com

simulating a proper DNS registration for this far-off site. Note that it is on a different network (192.168.124) from the one we normally use (192.168.123), so that when we try to access it over our LAN, we can't without help.

The file /usr/www/lan_setup on the FreeBSD machine is now:

ifconfig ep0 192.168.123.2
ifconfig ep0 192.168.123.3 alias netmask 0xFFFFFFFF
ifconfig ep0 192.168.124.1 alias

Now for the action: go to ... /site.proxy/real, and start the server with ./go - then go to ... /site.proxy/proxy, and start it with ./go. On your browser, access http://192.168.124.1/. You should see the following:

Index of /
. Parent Directory
. message

If we select message, we see:

I am a web site far out there

Fine, but are we fooling ourselves? Go to the browser's proxy settings, and disable the HTTP proxy by removing the IP address:

192.168.123.2

Then reaccess http://192.168.124.1/. You should get some sort of network error.

What happened? We asked the browser to retrieve http://192.168.124.1/. Since it is on network 192.168.123, it failed to find this address. So instead it used the proxy server at port 8000 on 192.168.123.2. It sent its message there:[33]

[33]This can be recognized as a proxy request by the http: in the URL.

GET http://192.168.124.1/ HTTP/1.0

The copy of Apache running on the FreeBSD machine, listening to port 8000, was offered this morsel and accepted the message. Since that copy of Apache had been told to service proxy requests, it retransmitted the request to the destination we thought it was bound for all the time: 192.168.123.1 (which it can do since it is on the same machine):

GET / HTTP/1.0

In real life, things are simpler: you only have to carry out steps two and three, and you can ignore the theology. When you have finished with all this, remember to remove the HTTP proxy IP address from your browser setup.

9.5.1. Reverse Proxy

This section explains a configuration setup for proxying your backend mod_perl servers when you need to use virtual hosts. See perl.apache.org/guide/scenario.html, from which we have quoted freely. While you are better off getting it right in the first place (i.e. using different URLs for the different servers), there are at least three reasons you might want to rewrite:

  1. Because you didn't think of it in the first place and you are now fighting fires.

  2. Because you want to save page size by using relative URLs instead of full ones.

  3. You might improve performance by, for instance, caching the results of expensive CGIs.

The term virtual host refers to the practice of maintaining more than one server on one machine, as differentiated by their apparent hostname. For example, it is often desirable for companies sharing a web server to have their own domains, with web servers accessible as www.company1.com and www.company2.com, without requiring the user to know any extra path information.

One approach is to use a unique port number for each virtual host at the backend server, so you can redirect from the frontend server to localhost:1234 and name-based virtual servers on the frontend, though any technique on the frontend will do.

If you run the frontend and the backend servers on the same machine, you can prevent any direct outside connections to the backend server if you bind tightly to address 127.0.0.1 (localhost), as you will see in the following configuration example.

This is the frontend (light) server configuration:

<VirtualHost 10.10.10.10>
  ServerName www.example.com
  ServerAlias example.com
  RewriteEngine On
  RewriteOptions 'inherit'
  RewriteRule \.(gif|jpg|png|txt|html)$ - [last]
  RewriteRule ^/(.*)$ http://localhost:4077/$1 [proxy]
</VirtualHost>
<VirtualHost 10.10.10.10>
  ServerName foo.example.com
  RewriteEngine On
  RewriteOptions 'inherit'
  RewriteRule \.(gif|jpg|png|txt|html)$ - [last]
  RewriteRule ^/(.*)$ http://localhost:4078/$1 [proxy]
</VirtualHost>

This frontend configuration handles two virtual hosts: www.example.com and foo.example.com. The two setups are almost identical.

The frontend server will handle files with the extensions .gif, .jpg, .png, .txt, and .html internally; the rest will be proxied to be handled by the backend server.

The only difference between the two virtual-host settings is that the former rewrites requests to port 4077 at the backend machine and the latter to port 4078.

If your server is configured to run traditional CGI scripts (under mod_cgi), as well as mod_perl CGI programs, then it would be beneficial to configure the frontend server to run the traditional CGI scripts directly. This can be done by altering the gif|jpg|png|txt Rewrite rule to add |cgi at the end if all your mod_cgi scripts have the .cgi extension, or by adding a new rule to handle all /cgi-bin/* locations locally.

Here is the backend (heavy) server configuration:

Port 80

PerlPostReadRequestHandler My::ProxyRemoteAddr

Listen 4077
<VirtualHost localhost:4077>
  ServerName www.example.com
  DocumentRoot /home/httpd/docs/www.example.com       
  DirectoryIndex index.shtml index.html
</VirtualHost>

Listen 4078
<VirtualHost localhost:4078>
  ServerName foo.example.com
  DocumentRoot /home/httpd/docs/foo.example.com
  DirectoryIndex index.shtml index.html
</VirtualHost>

The backend server knows to tell to which virtual host the request is made, by checking the port number to which the request was proxied and using the appropriate virtual host section to handle it.

We set Port 80 so that any redirects use 80 as the port for the URL, rather than the port on which the backend server is actually running.

To get the real remote IP addresses from proxy, My::ProxyRemoteAddr handler is used based on the mod_proxy_add_forward Apache module. Prior to mod_perl 1.22, this setting must have been set per-virtual host, since it wasn't inherited by the virtual hosts.

The following configuration is yet another useful example showing the other way around. It specifies what is to be proxied, and then the rest is served by the frontend:

  RewriteEngine     on
  RewriteLogLevel   0
  RewriteRule       ^/(perl.*)$  http://127.0.0.1:8052/$1   [P,L]
  NoCache           *
  ProxyPassReverse  /  http://www.example.com/

So we don't have to specify the rule for static objects to be served by the frontend, as we did in the previous example, to handle files with the extensions .gif, .jpg, .png and .txt internally.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.