Now that we have the server running with a basic configuration, we
can start to explore more sophisticated possibilities in greater
detail. Fortunately, the differences between the Windows and Unix
versions of Apache fade as we get past the initial setup and
configuration, so it's easier to focus on the
details of making a web site work.
3.1. More and Better Web Sites: site.simple
We are now in a position to start creating real(ish) web sites, which
can be found in the sample code at the web site for the book,
http://oreilly.com/catalog/apache3/. For the
sake of a little extra realism, we will base the site loosely round a
simple web business, Butterthlies, Inc., that creates and sells
picture postcards. We need to give it some web addresses, but since
we don't yet want to venture into the outside world,
they should be variants on your own network ID. This way, all the
machines in the network realize that they don't have
to go out on the Web to make contact. For instance, we edited the
\windows\hosts file on the Windows 95 machine
running the browser and the /etc/hosts file on
the Unix machine running the server to read as follows:
127.0.0.1 localhost
192.168.123.2 www.butterthlies.com
192.168.123.2 sales.butterthlies.com
192.168.123.3 sales-IP.butterthlies.com
192.168.124.1 www.faraway.com
localhost is obligatory, so we left it in, but
you should not make any server requests to it since the results are
likely to be confusing.
You probably need to consult your network manager to make similar
arrangements.
site.simple is site.toddle
with a few small changes. The script
go will work anywhere. To get started, do the
following, depending on your operating environment:

test -d logs || mkdir logs
httpd -d 'pwd' -f 'pwd'/conf/httpd.conf

Open an MS-DOS window and from the command line, type:
c>cd \program files\apache group\apache
c>apache -k start
c>Apache/1.3.26 (Win32) running ...

To stop Apache, open a second MS-DOS window:
c>apache -k stop
c>cd logs
c>edit error.log

This will be true of each site in the demonstration setup, so we will
not mention it again.
From here on, there will be minimal differences between the server
setups necessary for Win32 and those for Unix. Unless one or the
other is specifically mentioned, you should assume that the text
refers to both.
It would be nice to have a log of what goes on. In the first edition
of this book, we found that a file access_log
was created automatically in
...site.simple/logs. In a rather bizarre move
since then, the Apache Group has broken backward compatibility and
now requires you to mention the log file explicitly in the Config
file using the TransferLog directive.
The ... /conf/httpd.conf file now contains the
following:
User webuser
Group webgroup
ServerName www.butterthlies.com
DocumentRoot /usr/www/APACHE3/APACHE3/site.simple/htdocs
TransferLog logs/access_log
In ... /htdocs we have, as before,
1.txt :
hullo world from site.simple again!
Type ./go on the server. Become the client, and
retrieve http://www.butterthlies.com. You should
see:
Index of /
. Parent Directory
. 1.txt
Click on 1.txt for an inspirational message as
before.
This all seems satisfactory, but there is a hidden mystery. We get
the same result if we connect to
http://sales.butterthlies.com. Why is this? Why,
since we have not mentioned either of these URLs or their IP
addresses in the configuration file on
site.simple, do we get any response at all?
The answer is that when we configured the machine on which the server
runs, we told the network interface to respond to anyof these IP addresses:
192.168.123.2
192.168.123.3
By default Apache listens to all IP addresses belonging to the
machine and responds in the same way to all of them. If there are
virtual hosts
configured (which there aren't, in this case),
Apache runs through them, looking for an IP name that corresponds to
the incoming connection. Apache uses that configuration if it is
found, or the main configuration if it is not. Later in this chapter,
we look at more definite control with the directives
BindAddress, Listen, and
<VirtualHost>.
It has to be said that working like this (that is, switching rapidly
between different configurations) seemed to get
Netscape or
Internet Explorer into a rare muddle. To be sure that the server was
functioning properly while using Netscape as a browser, it was
usually necessary to reload the file under examination by holding
down the Control key while clicking on Reload. In extreme cases, it
was necessary to disable caching by going to Edit
Preferences
Advanced
Cache. Set
memory and disk cache to 0, and set cache comparison to Every Time.
In Internet Explorer, set Cache Compares to Every Time. If you
don't, the browser tends to display a jumble of
several different responses from the server. This occurs because we
are doing what no user or administrator would normally do, namely,
flipping around between different versions of the same site with
different versions of the same file. Whenever we flip from a newer
version to an older version, Netscape is led to believe that its
cached version is up-to-date.
Back on the server, stop Apache with ^C, and look
at the log files. In ... /logs/access_log, you
should see something like this:
192.168.123.1--- [<date-time>] "GET / HTTP/1.1" 200 177
200 is the response code (meaning
"OK, cool, fine"), and
177 is the number of bytes transferred. In
... /logs/error_log, there should be nothing
because nothing went wrong. However, it is a good habit to look there
from time to time, though you have to make sure that the date and
time logged correspond to the problem you are investigating. It is
easy to fool yourself with some long-gone drama.
Life being what it is, things can go wrong, and the client can ask
for something the server can't provide. It makes
sense to allow for this with the ErrorDocument
command.
3.1.1. ErrorDocument
The ErrorDocument directive lets you specify what
happens when a client asks for a nonexistent document.
ErrorDocument error-code "document(" in Apache v2)
Server config, virtual host, directory, .htaccess
In the event of a problem or error, Apache can be configured to do
one of four things:
-
Output a simple hardcoded error message.
-
Output a customized message.
-
Redirect to a local URL to handle the problem/error.
-
Redirect to an external URL to handle the problem/error.
The first option is the default, whereas options 2 through 4 are
configured using the ErrorDocument directive,
which is followed by the HTTP response code and a message or URL.
Messages in this context begin with a double quotation mark
("), which does not form part of the message
itself. Apache will sometimes offer additional information regarding
the problem or error.
URLs can be local URLs beginning with a
slash (/ ) or full URLs that the client can resolve. For example:
ErrorDocument 500 http://foo.example.com/cgi-bin/tester
ErrorDocument 404 /cgi-bin/bad_urls.pl
ErrorDocument 401 /subscription_info.html
ErrorDocument 403 "Sorry can't allow you access today"
Note that when you specify an ErrorDocument that
points to a remote URL (i.e., anything with a method such as
"http" in front of it), Apache will
send a redirect to the client to tell it where to find the document,
even if the document ends up being on the same server. This has
several implications, the most important being that if you use an
ErrorDocument 401 directive, it
must refer to a local document. This results from the nature of the
HTTP basic authentication scheme.