Web Server Configuration (CGI Programming with Perl)

1.4.1. Configuring CGI Scripts

Enabling CGI execution with Apache is very simple, although there is a good way to do it and a less good way to do it. Let's start with the good way, which involves creating a special directory for our CGI scripts.

1.4.1.1. Configuring by directory

The ScriptAlias directive tells the web server to map a virtual path (the path in a URL) to a directory on the disk and execute any files it finds there as CGI scripts.

To enable CGI scripts for our web server, place this directive in httpd.conf :

ScriptAlias          /cgi        /usr/local/apache/cgi-bin

For example, if a user accesses the URL:

http://your_host.com/cgi/my_script.cgi

then the local program:

/usr/local/apache/cgi-bin/my_script.cgi

will be executed by the server. Note that the cgi path in the URL does not need to be the same as the name of the filesystem directory, cgi-bin . Whether you map the CGI directory to the virtual path called cgi, cgi-bin, or anything else for that matter, is strictly your own preference. You can also have multiple directories hold CGI scripts if you need that feature:

ScriptAlias          /cgi        /usr/local/apache/cgi-bin/
ScriptAlias          /cgi2       /usr/local/apache/alt-cgi-bin/

The directory that holds CGI scripts must be outside the server's document root. In a standard Apache install, the document root maps to the htdocs directory. All files beneath this directory are browsable. By default, the cgi-bin directory is not beneath htdocs, so if we were to disable our ScriptAlias directive, for example, there would be no way to access the CGI scripts. There is a very good reason for this, and it is not simply to protect yourself from someone accidentally deleting the ScriptAlias directive.

Here is an example why you should not place your CGI script directory within the document root. Say you do decide that you want to have multiple directories for CGI scripts throughout your web site within the document root. You might decide that it would be nice to have a directory for each of your major applications. Say that you have an online widget store that you put in /usr/local/apache/htdocs/widgets and the CGI script directory at /usr/local/apache/htdocs/widgets/cgi. You then add the following directive:

ScriptAlias     /widgets-cgi   /usr/local/apache/htdocs/widgets/cgi

If you were to do this and test it, it would work fine. However, suppose that your company later expands to sell woozles in addition to widgets, so the store needs a more general name. You rename the widgets directory to store, update the ScriptAlias directive, update all related HTML links, and create a symbolic link from widgets to store in order to support those users who bookmarked the old name. Sounds like a good plan, right?

Unfortunately, that last step, the symbolic link, just created a large security hole. The problem is that it is now possible to access your CGI scripts via two different URLs. For example, you may have a CGI script called purchase.cgi that can be accessed either of these two ways:

http://localhost/store-cgi/purchase.cgi

http://localhost/widgets-cgi/purchase.cgi

The first URL will be handled by the ScriptAlias directive; the second will not. If users attempt to access the second URL, instead of being greeted by a web page, they will be greeted with the source code of your CGI script. If you're lucky, someone will send you an email notifying you of the problem. If you're not, a mischievous user may start poking around your scripts to find security holes to break into your system to get at more valuable information (like database passwords or credit card numbers).

Any symbolic link above a directory containing CGI scripts allows this security hole.[1] The scenario about renaming a directory and providing a link to its old name is simply one example of a situation when this may occur innocently. If you place your CGI scripts outside of your server's document root, you never have to worry about someone accidentally exposing your scripts this way.

[1]It is possible to configure Apache to not follow symbolic links, which provides an alternative solution. However, symbolic links in general can be quite useful, and they are enabled by default. The problem in this situation is not with the symbolic link; it is with having the CGI scripts in a browsable location.

You may wonder why revealing your source code is such a problem. CGI scripts have certain characteristics that make them quite different than other forms of executables from a security standpoint. They allow remote, anonymous users to run programs on your system. Thus, security should always be an important consideration, and your code must be flawless if you are willing to allow potential attackers to review your source code. Although security through obscurity is not good protection in and of itself, it certainly doesn't hurt when combined with other forms of security. We will discuss security in much greater detail in Chapter 8, "Security".

1.4.1.2. Configuring by extension

The alternative to configuring CGI scripts via a common directory is to distribute them throughout your document tree and have your web server recognize them by their filename extension, such as .cgi. This is a very bad idea, from the standpoint of both architecture and security.

From an architectural standpoint, you should not do this because having a common directory for all of your CGI scripts helps you manage them. As web sites grow, it may be difficult to keep track of all of the CGI scripts that your site uses. Placing them under a common directory makes them easier to find and promotes creating CGI scripts that are general solutions to multiple problems instead of handfuls of single-use scripts. You can then create subdirectories beneath the main /cgi directory to organize your scripts.

There are two reasons why configuring CGI scripts by extension is insecure. First, it allows anyone who has permissions to update HTML files to create CGI scripts. As we said, CGI scripts require particular security considerations, and you should not allow novice programmers to create scripts on production web servers. Second, it increases the likelihood that someone can view the source code to your CGI scripts. Many text editors create backup files while you are editing a file; some of them create these files in the same directory where you are working. For example, if you were editing a file called top_secret.cgi with emacs, it typically creates a backup file called top_secret.cgi~. If this second file makes it onto the production web server and someone with a lucky hunch attempts to request that file, the web server will not recognize the extension and will simply return the raw source code.

Of course, your text editor ideally should delete these files when you finish working on them, and you really should not be editing files directly on a production web server. But files like this do get left around sometimes, and they might make it to the production web server. Files also get renamed manually sometimes. A developer may wish to make changes to a file but save a backup of this file by making a copy and renaming it with a .bak extension. If a backup file were in a directory configured with ScriptAlias, then it is not displayed; it is treated like any other CGI script and executed, which is a much safer alternative.

So, if your web server happens to be configured to allow CGI scripts anywhere, here is how to fix it. The following line tells the web server to execute any file ending with a .cgi suffix:

AddHandler    cgi-script    .cgi

You can comment it out by preceding it with #, just like in Perl. Without this directive, Apache will treat .cgi files as unknown files and return them according to the default media type -- typically plain text. So be sure that you move all of your CGI scripts outside the document root before you remove this directive.

You may also turn off the CGI execute permissions for particular directories by disabling the ExecCGI option. The line to enable it looks like this:

<Directory "/usr/local/apache/htdocs">
  .
  .
  Options Indexes FollowSymLinks ExecCGI
  .
  .
</Directory>

There are probably many other lines above and below the Options directive, and the Options directive on your system may differ. If you remove ExecCGI, then even with the CGI handler directive enabled above, Apache will not execute CGI scripts in the location that this Options directive applies -- in this case, the document root, /usr/local/apache/htdocs. Users will instead get an error page telling them "Permission Denied."

Now that we have our web server set up, and we have gotten a chance to see what CGI can do, we can investigate CGI in more detail. We start the next chapter by reviewing HTTP, the language of the Web and the foundation of CGI.

1.4. Web Server Configuration

Table 1-1. Alternative Paths to Important Apache Directories

1.4.1. Configuring CGI Scripts

1.4.1.1. Configuring by directory

1.4.1.2. Configuring by extension