home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Practical mod_perlPractical mod_perlSearch this book

Chapter 4. mod_perl Configuration

The next step after building and installing a mod_perl-enabled Apache server is to configure it. This is done in two distinct steps: getting the server running with a standard Apache configuration, and then applying mod_perl-specific configuration directives to get the full benefit out of it.

For readers who haven't previously been exposed to the Apache web server, our discussion begins with standard Apache directives and then continues with mod_perl-specific material.

The startup.pl file can be used in many ways to improve performance. We will talk about all these issues later in the book. In this chapter, we discuss the configuration possibilities that the startup.pl file gives us.

<Perl>sections are a great time saver if you have complex configuration files. We'll talk about <Perl>sections in this chapter.

Another important issue we'll cover in this chapter is how to validate the configuration file. This is especially important on a live production server. If we break something and don't validate it, the server won't restart. This chapter discusses techniques to prevent validation problems.

At the end of this chapter, we discuss various tips and tricks you may find useful for server configuration, talk about a few security concerns related to server configuration, and finally look at a few common pitfalls people encounter when they misconfigure their servers.

4.1. Apache Configuration

Apache configuration can be confusing. To minimize the number of things that can go wrong, it's a good idea to first configure Apache itself without mod_perl. So before we go into mod_perl configuration, let's look at the basics of Apache itself.

4.1.1. Configuration Files

Prior to Version 1.3.4, the default Apache installation used three configuration files: httpd.conf, srm.conf, and access.conf. Although there were historical reasons for having three separate files (dating back to the NCSA server), it stopped mattering which file you used for what a long time ago, and the Apache team finally decided to combine them. Apache Versions 1.3.4 and later are distributed with the configuration directives in a single file, httpd.conf. Therefore, whenever we mention a configuration file, we are referring to httpd.conf.

By default, httpd.conf is installed in the conf directory under the server root directory. The default server root is /usr/local/apache/ on many Unix platforms, but it can be any directory of your choice (within reason). Users new to Apache and mod_perl will probably find it helpful to keep to the directory layouts we use in this book.

There is also a special file called .htaccess, used for per-directory configuration. When Apache tries to access a file on the filesystem, it will first search for .htaccess files in the requested file's parent directories. If found, Apache scans .htaccess for further configuration directives, which it then applies only to that directory in which the file was found and its subdirectories. The name .htaccess is confusing, because it can contain almost any configuration directives, not just those related to resource access control. Note that if the following directive is in httpd.conf:

<Directory />
    AllowOverride None
</Directory>

Apache will not look for .htaccess at all unless AllowOverride is set to a value other than None in a more specific <Directory>section.

.htaccess can be renamed by using the AccessFileName directive. The following example configures Apache to look in the target directory for a file called .acl instead of .htaccess:

AccessFileName .acl

However, you must also make sure that this file can't be accessed directly from the Web, or else you risk exposing your configuration. This is done automatically for .ht* files by Apache, but for other files you need to use:

<Files .acl>
    Order Allow,Deny
    Deny from all
</Files>

Another often-mentioned file is the startup file, usually named startup.pl. This file contains Perl code that will be executed at server startup. We'll discuss the startup.pl file in greater detail later in this chapter, in Section 4.3.

Beware of editing httpd.conf without understanding all the implications. Modifying the configuration file and adding new directives can introduce security problems and have performance implications. If you are going to modify anything, read through the documentation beforehand. The Apache distribution comes with an extensive configuration manual. In addition, each section of the distributed configuration file includes helpful comments explaining how each directive should be configured and what the default values are.

If you haven't moved Apache's directories around, the installation program will configure everything for you. You can just start the server and test it. To start the server, use the apachectl utility bundled with the Apache distribution. It resides in the same directory as httpd, the Apache server itself. Execute:

panic% /usr/local/apache/bin/apachectl start

Now you can test the server, for example by accessing http://localhost/ from a browser running on the same host.

4.1.3. <Directory>, <Location>, and <Files> Sections

Let's discuss the basics of the <Directory>, <Location>, and <Files>sections. Remember that there is more to know about them than what we list here, and the rest of the information is available in the Apache documentation. The information we'll present here is just what is important for understanding mod_perl configuration.

Apache considers directories and files on the machine it runs on as resources. A particular behavior can be specified for each resource; that behavior will apply to every request for information from that particular resource.

Directives in <Directory> sections apply to specific directories on the host machine, and those in <Files> sections apply only to specific files (actually, groups of files with names that have something in common). <Location> sections apply to specific URIs. Locations are given relative to the document root, whereas directories are given as absolute paths starting from the filesystem root (/). For example, in the default server directory layout where the server root is /usr/local/apache and the document root is /usr/local/apache/htdocs, files under the /usr/local/apache/htdocs/pub directory can be referred to as:

<Directory /usr/local/apache/htdocs/pub>
</Directory>

or alternatively (and preferably) as:

<Location /pub>
</Location>

Exercise caution when using <Location> under Win32. The Windows family of operating systems are case-insensitive. In the above example, configuration directives specified for the location /pub on a case-sensitive Unix machine will not be applied when the request URI is /Pub. When URIs map to existing files, such as Apache::Registryscripts, it is safer to use the <Directory> or <Files> directives, which correctly canonicalize filenames according to local filesystem semantics.

It is up to you to decide which directories on your host machine are mapped to which locations. This should be done with care, because the security of the server may be at stake. In particular, essential system directories such as /etc/ shouldn't be mapped to locations accessible through the web server. As a general rule, it might be best to organize everything accessed from the Web under your ServerRoot, so that it stays organized and you can keep track of which directories are actually accessible.

Locations do not necessarily have to refer to existing physical directories, but may refer to virtual resources that the server creates upon a browser request. As you will see, this is often the case for a mod_perl server.

When a client (browser) requests a resource (URI plus optional arguments) from the server, Apache determines from its configuration whether or not to serve the request, whether to pass the request on to another server, what (if any) authentication and authorization is required for access to the resource, and which module(s) should be invoked to generate the response.

For any given resource, the various sections in the configuration may provide conflicting information. Consider, for example, a <Directory>section that specifies that authorization is required for access to the resource, and a <Files>section that says that it is not. It is not always obvious which directive takes precedence in such cases. This can be a trap for the unwary.

4.1.3.1. <Directory directoryPath> ... </Directory>

Scope: Can appear in server and virtual host configurations.

<Directory> and </Directory> are used to enclose a group of directives that will apply to only the named directory and its contents, including any subdirectories. Any directive that is allowed in a directory context (see the Apache documentation) may be used.

The path given in the <Directory> directive is either the full path to a directory, or a string containing wildcard characters (also called globs). In the latter case, ? matches any single character, * matches any sequence of characters, and [ ] matches character ranges. These are similar to the wildcards used by sh and similar shells. For example:

<Directory /home/httpd/docs/foo[1-2]>
    Options Indexes
</Directory>

will match /home/httpd/docs/foo1 and /home/httpd/docs/foo2. None of the wildcards will match a / character. For example:

<Directory /home/httpd/docs>
    Options Indexes
</Directory>

matches /home/httpd/docs and applies to all its subdirectories.

Matching a regular expression is done by using the <DirectoryMatch regex> ... </DirectoryMatch> or <Directory ~ regex> ... </Directory>syntax. For example:

<DirectoryMatch /home/www/.*/public>
    Options Indexes
</DirectoryMatch>

will match /home/www/foo/public but not /home/www/foo/private. In a regular expression, .* matches any character (represented by .) zero or more times (represented by *). This is entirely different from the shell-style wildcards used by the <Directory> directive. They make it easy to apply a common configuration to a set of public directories. As regular expressions are more flexible than globs, this method provides more options to the experienced user.

If multiple (non-regular expression) <Directory>sections match the directory (or its parents) containing a document, the directives are applied in the order of the shortest match first, interspersed with the directives from any .htaccess files. Consider the following configuration:

<Directory />
    AllowOverride None
</Directory>

<Directory /home/httpd/docs/>
    AllowOverride FileInfo
</Directory>

Let us detail the steps Apache goes through when it receives a request for the file /home/httpd/docs/index.html:

  1. Apply the directive AllowOverride None (disabling .htaccess files).

  2. Apply the directive AllowOverride FileInfo for the directory /home/httpd/docs/ (which now enables .htaccess in /home/httpd/docs/ and its subdirectories).

  3. Apply any directives in the group FileInfo, which control document types (AddEncoding, AddLanguage, AddType, etc.—see the Apache documentation for more information) found in /home/httpd/docs/.htaccess.

4.1.3.2. <Files filename > ... </Files>

Scope: Can appear in server and virtual host configurations, as well as in .htaccess files.

The <Files> directive provides access control by filename and is comparable to the <Directory> and <Location> directives. <Files>should be closed with the corresponding </Files>. The directives specified within this section will be applied to any object with a basename matching the specified filename. (A basename is the last component of a path, generally the name of the file.)

<Files>sections are processed in the order in which they appear in the configuration file, after the <Directory>sections and .htaccess files are read, but before <Location>sections. Note that <Files> can be nested inside <Directory>sections to restrict the portion of the filesystem to which they apply. However, <Files> cannot be nested inside <Location>sections.

The filename argument should include a filename or a wildcard string, where ? matches any single character and * matches any sequence of characters, just as with <Directory>sections. Extended regular expressions can also be used, placing a tilde character (~) between the directive and the regular expression. The regular expression should be in quotes. The dollar symbol ($) refers to the end of the string. The pipe character (|) indicates alternatives, and parentheses (()) can be used for grouping. Special characters in extended regular expressions must be escaped with backslashes (\). For example:

<Files ~ "\.(pl|cgi)$">
    SetHandler perl-script
    PerlHandler Apache::Registry
    Options +ExecCGI
</Files>

would match all the files ending with the .pl or .cgi extension (most likely Perl scripts). Alternatively, the <FilesMatch regex> ... </FilesMatch>syntax can be used.

Regular Expressions

There is much more to regular expressions than what we have shown you here. As a Perl programmer, learning to use regular expressions is very important, and what you can learn there will be applicable to your Apache configuration too.

See the perlretut manpage and the book Mastering Regular Expressions by Jeffrey E. F. Friedl (O'Reilly) for more information.

4.1.3.3. <Location URI> ... </Location>

Scope: Can appear in server and virtual host configurations.

The <Location> directive provides for directive scope limitation by URI. It is similar to the <Directory> directive and starts a section that is terminated with the </Location> directive.

<Location>sections are processed in the order in which they appear in the configuration file, after the <Directory>sections, .htaccess files, and <Files>sections have been interpreted.

The <Location>section is the directive that is used most often with mod_perl.

Note that URIs do not have to refer to real directories or files within the filesystem at all; <Location> operates completely outside the filesystem. Indeed, it may sometimes be wise to ensure that <Location>s do not match real paths, to avoid confusion.

The URI may use wildcards. In a wildcard string, ? matches any single character, * matches any sequences of characters, and [ ] groups characters to match. For regular expression matches, use the <LocationMatch regex> ... </LocationMatch>syntax.

The <Location> functionality is especially useful when combined with the SetHandler directive. For example, to enable server status requests (via mod_status) but allow them only from browsers at *.example.com, you might use:

<Location /status>
    SetHandler server-status
    Order Deny,Allow
    Deny from all
    Allow from .example.com
</Location>

As you can see, the /status path does not exist on the filesystem, but that doesn't matter because the filesystem isn't consulted for this request—it's passed on directly to mod_status.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.