home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Writing Apache Modules with Perl and C
By:   Lincoln Stein and Doug MacEachern
Published:   O'Reilly & Associates, Inc.  - March 1999

Copyright © 1999 by O'Reilly & Associates, Inc.


 


   Show Contents   Previous Page   Next Page

Chapter 4 - Content Handlers
Content Handlers as File Processors

In this section...

Introduction
Adding a Canned Footer to Pages
A Server-Side Include System
Converting Image Formats
A Dynamic Navigation Bar
Handling If-Modified-Since
Sending Static Files

Introduction

   Show Contents   Go to Top   Previous Page   Next Page

Early web servers were designed as engines for transmitting physical files from the host machine to the browser. Even though Apache does much more, the file-oriented legacy still remains. Files can be sent to the browser unmodified or passed through content handlers to transform them in various ways before sending them on to the browser. Even though many of the documents that you produce with modules have no corresponding physical files, some parts of Apache still behave as if they did.

When Apache receives a request, the URI is passed through any URI translation handlers that may be installed (see Chapter 7, Other Request Phases, for information on how to roll your own), transforming it into a file path. The mod_alias translation handler (compiled in by default) will first process any Alias, ScriptAlias, Redirect, or other mod_alias directives. If none applies, the http_core default translator will simply prepend the DocumentRoot directory to the beginning of the URI.

Next, Apache attempts to divide the file path into two parts: a "filename" part which usually (but not always) corresponds to a physical file on the host's filesystem, and an "additional path information" part corresponding to additional stuff that follows the filename. Apache divides the path using a very simple-minded algorithm. It steps through the path components from left to right until it finds something that doesn't correspond to a directory on the host machine. The part of the path up to and including this component becomes the filename, and everything that's left over becomes the additional path information.

Consider a site with a document root of /home/www that has just received a request for URI /abc/def/ghi. The way Apache splits the file path into filename and path information parts depends on what directories it finds in the document root:

Physical Directory
Translated Filename
Additional Path Information
/home/www
/home/www/abc
/def/ghi
/home/www/abc
/home/www/abc/def
/ghi
/home/www/abc/def
/home/www/abc/def/ghi
empty
/home/www/abc/def/ghi
/home/www/abc/def/ghi
empty

Note that the presence of any actual files in the path is irrelevant to this process. The division between the filename and the path information depends only on what directories are present.

Once Apache has decided where the file is in the path, it determines what MIME type it might be. This is again one of the places where you can intervene to alter the process with a custom type handler. The default type handler (mod_mime) just compares the filename's extension to a table of MIME types. If there's a match, this becomes the MIME type. If no match is found, then the MIME type is undefined. Again, note that this mapping from filename to MIME type occurs even when there's no actual file there.

There are two special cases. If the last component of the filename happens to be a physical directory, then Apache internally assigns it a "magic" MIME type, defined by the DIR_MAGIC_TYPE constant as httpd/unix-directory. This is used by the directory module to generate automatic directory listings. The second special case occurs when you have the optional mod_mime_magic module installed and the file actually exists. In this case Apache will peek at the first few bytes of the file's contents to determine what type of file it might be. Chapter 7 shows you how to write your own MIME type checker handlers to implement more sophisticated MIME type determination schemes.

After Apache has determined the name and type of the file referenced by the URI, it decides what to do about it. One way is to use information hard-wired into the module's static data structures. The module's handler_rec table, which we describe in detail in Chapter 10, C API Reference Guide, Part I, declares the module's willingness to handle one or more magic MIME types and associates a content handler with each one. For example, the mod_cgi module associates MIME type application/x-httpd-cgi with its cgi_handler() handler subroutine. When Apache detects that a filename is of type application/x-httpd-cgi it invokes cgi_handler() and passes it information about the file. A module can also declare its desire to handle an ordinary MIME type, such as video/quicktime, or even a wildcard type, such as video/*. In this case, all requests for URIs with matching MIME types will be passed through the module's content handler unless some other module registers a more specific type.

Newer modules use a more flexible method in which content handlers are associated with files at runtime using explicit names. When this method is used, the module declares one or more content handler names in its handler_rec array instead of, or in addition to, MIME types. Some examples of content handler names you might have seen include cgi-script, server-info, server-parsed, imap-file, and perl-script. Handler names can be associated with files using either AddHandler or SetHandler directives. AddHandler associates a handler with a particular file extension. For example, a typical configuration file will contain this line to associate .shtml files with the server-side include handler:

AddHandler server-parsed .shtml

Now, the server-parsed handler defined by mod_include will be called on to process all files ending in ".shtml" regardless of their MIME type.

SetHandler is used within <Directory>, <Location>, and <Files> sections to associate a particular handler with an entire section of the site's URI space. In the two examples that follow, the <Location> section attaches the server-parsed method to all files within the virtual directory /shtml, while the <Files> section attaches imap-file to all files that begin with the prefix "map-":

<Location /shtml>
 SetHandler server-parsed
</Location>
<Files map-*>
 SetHandler imap-file
</Files>

Surprisingly, the AddHandler and SetHandler directives are not actually implemented in the Apache core. They are implemented by the standard mod_actions module, which is compiled into the server by default. In Chapter 7, we show how to reimplement mod_actions using the Perl API.

You'll probably want to use explicitly named content handlers in your modules rather than hardcoded MIME types. Explicit handler names make configuration files cleaner and easier to understand. Plus, you don't have to invent a new magic MIME type every time you add a handler.

Things are slightly different for mod_perl users because two directives are needed to assign a content handler to a directory or file. The reason for this is that the only real content handler defined by mod_perl is its internal perl-script handler. You use SetHandler to assign perl-script the responsibility for a directory or partial URI, and then use a PerlHandler directive to tell the perl-script handler which Perl module to execute. Directories supervised by Perl API content handlers will look something like this:

<Location /graph>
  SetHandler  perl-script
  PerlHandler Apache::Graph
</Location>

Don't try to assign perl-script to a file extension using something like AddHandler perl-script .pl; this is generally useless because you'd need to set PerlHandler too. If you'd like to associate a Perl content handler with an extension, you should use the <Files> directive. Here's an example:

<Files ~ "\.graph$">
  SetHandler  perl-script
  PerlHandler Apache::Graph
</Files>

There is no UnSetHandler directive to undo the effects of SetHandler. However, should you ever need to restore a subdirectory's handler to the default, you can do it with the directive SetHandler default-handler, as follows:

<Location /graph/tutorial>
  SetHandler default-handler
</Location>
   Show Contents   Go to Top   Previous Page   Next Page
Copyright © 1999 by O'Reilly & Associates, Inc.