1.4. Web Server Configuration
Before you can run CGI
programs on your server, certain parameters in the server
configuration files must be modified. Throughout this book, we will
use the Apache web server on a Unix platform in
our examples. Apache is by far the most popular web server available,
plus it's open source and available for free. Apache is derived
from the NCSA web server, so many configuration details for it are
similar to those for other web servers that are also derived from the
NCSA server, such as those sold by iPlanet (formerly Netscape).
We assume that you already have access to a working web server, so we
won't cover how to install and initially configure Apache. That
lengthy discussion would be well beyond the scope of this book, and
that information is already available in another fine book,
Apache: The Definitive Guide, by Ben and Peter
Laurie (O'Reilly & Associates, Inc.).
Apache is not always installed in the same place on all systems.
Throughout this book, we will use the
default installation path, which
places everything beneath /usr/local/apache.
Apache's subdirectories are:
$ cd /usr/local/apache
$ ls -F
bin/ cgi-bin/ conf/ htdocs/ icons/ include/ libexec/ logs/ man/ proxy/
Depending on how Apache was configured during installation, you may
not have some directories, such as libexec or
proxy ; this is fine. With some popular Unix and
Unix-compatible distributions that include Apache (e.g., some Linux
distributions), the subdirectories above may be distributed across
the system instead. For example, on RedHat Linux, the
subdirectories are remapped, as shown in Table 1-1.
Table 1-1. Alternative Paths to Important Apache Directories
Default Installation Path |
Alternative Path (RedHat Linux) |
/usr/local/apache/cgi-bin |
/home/httpd/cgi-bin |
/usr/local/apache/htdocs |
/home/httpd/html |
/usr/local/apache/conf |
/etc/httpd/conf |
/usr/local/apache/logs |
/var/log/httpd |
If this is the case, you will need to translate our instructions to
the paths on your system. If Apache is installed on your system, and
its directories are not at either of these locations, then ask your
system administrator or refer to your system documentation to locate
them.
You configure Apache by modifying the configuration files found in
the conf directory. These files contain
directives that Apache reads when it starts. Older versions of Apache
included three files: httpd.conf,
srm.conf, and access.conf.
However, using the latter two files was never required, and recent
distributions of Apache include all of the directives in
httpd.conf. This allows you to manage the full
configuration in one location without bouncing between files. It also
avoids situations where your configuration between files does not
match, which can create security problems.
Many sites still use all three configuration files, if only because
they have not bothered to combine them. Therefore, here and
throughout the book, whenever we discuss Apache configuration, we
will specify the alternative name of the file you need to edit if you
are using all three files.
Finally, remember that Apache must be told to reread its
configuration files whenever you make changes to them. You do not
need to do a full server restart, although that also works. If your
system has the
apachectl
command (part of the
standard install), you can tell Apache to reread its configuration
while it is running with this command:
$ apachectl graceful
This may require superuser (i.e., root)
privileges.
1.4.1. Configuring CGI Scripts
Enabling
CGI execution with Apache is very simple,
although there is a good way to do it and a less good way to do it.
Let's start with the good way, which involves creating a
special directory for our CGI scripts.
1.4.1.1. Configuring by directory
The ScriptAlias
directive tells the web server to map a
virtual path (the path in a URL) to a
directory on the disk and execute any files it finds there as CGI
scripts.
To enable CGI scripts for our web server, place this directive in
httpd.conf :
ScriptAlias /cgi /usr/local/apache/cgi-bin
For example, if a user accesses the URL:
http://your_host.com/cgi/my_script.cgi
then the local program:
/usr/local/apache/cgi-bin/my_script.cgi
will be executed by the server. Note that the cgi
path in the URL does not need to be the
same as the name of the filesystem directory,
cgi-bin
. Whether you map the CGI directory to the
virtual path called cgi,
cgi-bin, or anything else for
that matter, is strictly your own preference. You can also have
multiple directories hold CGI scripts if you need that feature:
ScriptAlias /cgi /usr/local/apache/cgi-bin/
ScriptAlias /cgi2 /usr/local/apache/alt-cgi-bin/
The directory that holds CGI scripts must be outside the
server's document root. In a standard Apache install, the
document
root maps to the
htdocs directory. All files beneath this
directory are browsable. By default, the cgi-bin
directory is not beneath htdocs, so if we were
to disable our ScriptAlias directive, for example,
there would be no way to access the CGI scripts. There is a very good
reason for this, and it is not simply to protect yourself from
someone accidentally deleting the ScriptAlias
directive.
Here is an example why you should not place your CGI script directory
within the document root. Say you do decide that you want to have
multiple directories for CGI scripts throughout your web site within
the document root. You might decide that it would be nice to have a
directory for each of your major applications. Say that you have an
online widget store that you put in
/usr/local/apache/htdocs/widgets and the CGI
script directory at
/usr/local/apache/htdocs/widgets/cgi. You then
add the following directive:
ScriptAlias /widgets-cgi /usr/local/apache/htdocs/widgets/cgi
If you were to do this and test it, it would work fine. However,
suppose that your company later expands to sell woozles in addition
to widgets, so the store needs a more general name. You rename the
widgets directory to store,
update the ScriptAlias directive, update all
related HTML links, and create a symbolic link from
widgets to store in order
to support those users who bookmarked the old name. Sounds like a
good plan, right?
Unfortunately, that last step, the symbolic link, just
created a large security hole. The problem
is that it is now possible to access your CGI scripts via two
different URLs. For example, you may have a CGI script called
purchase.cgi that can be accessed either of
these two ways:
http://localhost/store-cgi/purchase.cgi
http://localhost/widgets-cgi/purchase.cgi
The first URL will be handled by the ScriptAlias
directive; the second will not. If users attempt to access the second
URL, instead of being greeted by a web page, they will be greeted
with the source code of your CGI script. If you're lucky,
someone will send you an email notifying you of the problem. If
you're not, a mischievous user may start poking around your
scripts to find security holes to break into your system to get at
more valuable information (like database passwords or credit card
numbers).
Any symbolic link above a directory containing CGI scripts allows
this security hole.[1] The scenario about renaming a
directory and providing a link to its old name is simply one example
of a situation when this may occur innocently. If you place your CGI
scripts outside of your server's document root, you never have
to worry about someone accidentally exposing your scripts this way.
[1]It is possible to configure
Apache to not follow symbolic links, which provides an alternative
solution. However, symbolic links in general can be quite useful, and
they are enabled by default. The problem in this situation is not
with the symbolic link; it is with having the CGI scripts in a
browsable location.
You may wonder why revealing your source code is such a problem. CGI
scripts have certain characteristics that make them quite different
than other forms of executables from a security standpoint. They
allow remote, anonymous users to run programs on your system. Thus,
security should always be an important consideration, and your code
must be flawless if you are willing to allow potential attackers to
review your source code. Although security through obscurity is not
good protection in and of itself, it certainly doesn't hurt
when combined with other forms of security. We will discuss security
in much greater detail in Chapter 8, "Security".
1.4.1.2. Configuring by extension
The alternative to configuring CGI
scripts via a common directory is to distribute them
throughout your document tree and have your web server recognize them
by their filename extension, such as
.cgi. This is a very bad idea, from the
standpoint of both
architecture and security.
From an architectural standpoint, you should not do this because
having a common directory for all of your CGI scripts helps you
manage them. As web sites grow, it may be difficult to keep track of
all of the CGI scripts that your site uses. Placing them under a
common directory makes them easier to find and promotes creating CGI
scripts that are general solutions to multiple problems instead of
handfuls of single-use scripts. You can then create subdirectories
beneath the main /cgi directory to organize your
scripts.
There are two reasons why configuring CGI scripts by extension is
insecure. First, it allows anyone who has permissions to update HTML
files to create CGI scripts. As we said, CGI scripts require
particular security considerations, and you should not allow novice
programmers to create scripts on production web servers. Second, it
increases the likelihood that someone can view the source code to
your CGI scripts. Many text
editors create backup files while you are
editing a file; some of them create these files in the same directory
where you are working. For example, if you were editing a file called
top_secret.cgi with emacs, it
typically creates a backup file called
top_secret.cgi~. If this second file makes it
onto the production web server and someone with a lucky hunch
attempts to request that file, the web server will not recognize the
extension and will simply return the raw source code.
Of course, your text editor ideally should delete these files when
you finish working on them, and you really should not be editing
files directly on a production web server. But files like this do get
left around sometimes, and they might make it to the production web
server. Files also get renamed manually sometimes. A developer may
wish to make changes to a file but save a backup of this file by
making a copy and renaming it with a .bak
extension. If a backup file were in a directory configured with
ScriptAlias, then it is not displayed; it is
treated like any other CGI script and executed, which is a much safer
alternative.
So, if your web server happens to be configured to allow CGI scripts
anywhere, here is how to fix it. The following line tells the web
server to execute any file ending with a .cgi
suffix:
AddHandler cgi-script .cgi
You can comment it
out by preceding it with
#, just like in Perl. Without this directive,
Apache will treat .cgi files as unknown files
and return them according to the default media type -- typically
plain text. So be sure that you move all of your CGI scripts outside
the document root before you remove this directive.
You may also turn off the CGI
execute permissions for particular
directories by disabling the
ExecCGI
option. The line to enable it looks
like this:
<Directory "/usr/local/apache/htdocs">
.
.
Options Indexes FollowSymLinks ExecCGI
.
.
</Directory>
There are probably many other lines above and below the
Options directive, and the
Options directive on your system may differ. If
you remove ExecCGI, then even with the CGI handler
directive enabled above, Apache will not execute CGI scripts in the
location that this Options directive
applies -- in this case, the document root,
/usr/local/apache/htdocs. Users will instead get
an error page telling them "Permission Denied."
Now that we have our web server set up, and we have gotten a chance
to see what CGI can do, we can investigate CGI in more detail. We
start the next chapter by reviewing HTTP, the language of the Web
and the
foundation of CGI.
 |  |  | 1.3. Alternative Technologies |  | 2. The Hypertext Transport Protocol |
Copyright © 2001 O'Reilly & Associates. All rights reserved.
|
|