home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Writing Apache Modules with Perl and C
By:   Lincoln Stein and Doug MacEachern
Published:   O'Reilly & Associates, Inc.  - March 1999

Copyright © 1999 by O'Reilly & Associates, Inc.


 


   Show Contents   Previous Page   Next Page

Chapter 8 - Customizing the Apache Configuration Process / The Apache Configuration Directive API
Reimplementing mod_mime in Perl

As a full example of creating custom configuration directives, we're going to reimplement the standard mod_mime module in Perl. It has a total of seven different directives, each with a different argument syntax. In addition to showing you how to handle a complex configuration setup, this example will show you in detail what goes on behind the scenes as mod_mime associates a content handler with each URI request.

This module replaces the standard mod_mime module. You do not have to remove mod_mime from the standard compiled-in modules in order to test this module. However, if you wish to remove mod_mime anyway in order to convince yourself that the replacement actually works, the easiest way to do this is to compile mod_mime as a dynamically loaded module and then comment out the lines in httpd.conf that load it. In either case, install Apache::MIME as the default MIME-checking phase handler by putting this line in perl.conf or one of the other configuration files:

PerlTypeHandler Apache::MIME

Like the previous example, the configuration information is contained in two files. Makefile.PL (Example 8-3) describes the directives, and Apache/MIME.pm (Example 8-4) defines the callbacks for processing the directives at runtime. In order to reimplement mod_mime, we need to reimplement a total of seven directives, including SetHandler, AddHandler, AddType, and AddEncoding.

Makefile.PL defines the seven directives using the anonymous hash method. All but one of the directives is set to use the OR_FILEINFO context, which allows the directives to appear anywhere in the main configuration files, as well as in .htaccess files, provided that Override FileInfo is also set. The exception, TypesConfig, is the directive that indicates where the default table of MIME types is to be found. It only makes sense to process this directive during server startup, so its context is given as RSRC_CONF, limiting the directive to the body of any of the .conf files. We don't specify the args_how key for the directives; instead, we allow command_table() to figure out the syntax for us by looking at the function prototypes in MIME.pm.

Running perl Makefile.PL will now create a .xs file, which will be compiled into a loadable object file during make.

Example 8-3. Makefile.PL for Apache::MIME

package Apache::MIME;
# File: Makefile.PL
use ExtUtils::MakeMaker;
# See lib/ExtUtils/MakeMaker.pm for details of how to influence
# the contents of the Makefile that is written.
use Apache::src ();
use Apache::ExtUtils qw(command_table);
my @directives = (
   { name         => 'SetHandler',
     errmsg       => 'a handler name',
     req_override => 'OR_FILEINFO' },
   { name         => 'AddHandler',
     errmsg       => 'a handler name followed by one or more file extensions', 
req_override => 'OR_FILEINFO' }, { name => 'ForceType', errmsg => 'a handler name', req_override => 'OR_FILEINFO' }, { name => 'AddType', errmsg => 'a mime type followed by one or more file extensions', req_override => 'OR_FILEINFO' }, { name => 'AddLanguage', errmsg => 'a language (e.g., fr), followed by one or more file extensions', req_override => 'OR_FILEINFO' }, { name => 'AddEncoding', errmsg => 'an encoding (e.g., gzip), followed by one or more file extensions',
req_override => 'OR_FILEINFO' }, { name => 'TypesConfig', errmsg => 'the MIME types config file', req_override => 'RSRC_CONF' }, );
command_table \@directives;
WriteMakefile(
   'NAME'     => __PACKAGE__,
   'VERSION_FROM' => 'MIME.pm',
   'INC'      => Apache::src->new->inc,
);
__END__

Turning to Example 8-4, we start by bringing in the DynaLoader and Apache::ModuleConfig modules as we did in the overview example at the beginning of this section:

package Apache::MIME;
# File: Apache/MIME.pm
use strict;
use vars qw($VERSION @ISA);
use LWP::MediaTypes qw(read_media_types guess_media_type add_type add_encoding);   
use DynaLoader (); use Apache (); use Apache::ModuleConfig (); use Apache::Constants qw(:common DIR_MAGIC_TYPE DECLINE_CMD);
@ISA = qw(DynaLoader);
$VERSION = '0.01';
if($ENV{MOD_PERL}) {
  no strict;
  @ISA = qw(DynaLoader);
  __PACKAGE__->bootstrap($VERSION);
}

We also bring in Apache, Apache::Constants, and an LWP library called LWP::Media-Types. The Apache and Apache::Constants libraries will be used within the handler() subroutine, while the LWP library provides utilities for guessing MIME types, languages, and encodings from file extensions. As before, Apache::MIME needs to call bootstrap() immediately after loading other modules in order to bring in its compiled .xs half. Notice that we have to explicitly import the DIR_MAGIC_TYPE and DECLINE_CMD constants from Apache::Constants, as these are not exported by default.

Let's skip over handler() for the moment and look at the seven configuration callbacks: TypesConfig(), AddType(), AddEncoding(), and so on.

sub TypesConfig ($$$) {
   my($cfg, $parms, $file) = @_;
   my $types_config = Apache->server_root_relative($file);
   read_media_types($types_config);
   #to co-exist with mod_mime.c
   return DECLINE_CMD if Apache->module("mod_mime.c");
}

TypesConfig() has a function prototype of ($$$), indicating a directive syntax of TAKE1. It will be called with the name of the file holding the MIME types table as its third argument. The callback retrieves the filename, turns it into a server-relative path, and stores the path into a lexical variable. The callback then calls the LWP function read_media_types() to parse the file and add the MIME types found there to an internal table maintained by LWP::MediaTypes. When the LWP::Media-Types function guess_media_type() is called subsequently, this table will be consulted. Note that there is no need, in this case, to store the configuration information into the $cfg hash reference because the information is only needed at the time the configuration directive is processed.

Another important detail is that the TypesConfig handler will return DECLINE_CMD if the mod_mime module is installed. This gives mod_mime a chance to also read the TypesConfig file. If mod_mime isn't given this opportunity, it will complain bitterly and abort server startup. However, we don't allow any of the other directive handlers to fall through to mod_mime in this way, effectively cutting mod_mime out of the loop.

sub AddType ($$@;@) {
   my($cfg, $parms, $type, $ext) = @_;
   add_type($type, $ext);
}

The AddType() directive callback is even shorter. Its function prototype is ($$@;@), indicating an ITERATE2 syntax. This means that if the AddType directive looks like this:

AddType application/x-chicken-feed .corn .barley .oats

the function will be called three times. Each time the callback is invoked its third argument will be application/x-chicken-feed and the fourth argument will be successively set to .corn, .barley, and .oats. The function recovers the third and fourth parameters and passes them to the LWP::MediaTypes function add_type(). This simply adds the file type and extension to LWP's internal table.

sub AddEncoding ($$@;@) {
   my($cfg, $parms, $enc, $ext) = @_;
   add_encoding($enc, $ext);
}

AddEncoding() is similar to AddType() but uses the LWP::MediaTypesadd_encoding() function to associate a series of file extensions with a MIME encoding.

More interesting are the SetHandler() and AddHandler() callbacks:

sub SetHandler ($$$) {
   my($cfg, $parms, $handler) = @_;
   $cfg->{'handler'} = $handler;
}
sub AddHandler ($$@;@) {
   my($cfg, $parms, $handler, $ext) = @_;
   $cfg->{'handlers'}->{$ext} = $handler;
}

The job of the SetHandler directive is to force requests for the specified path to be passed to the indicated content handler, no questions asked. AddHandler(), in contrast, adds a series of file extensions to the table consulted by the MIME type checker when it attempts to choose the proper content handler for the request. In both cases, the configuration information is needed again at request time, so we have to keep it in long-term storage within the $cfg hash.

SetHandler() is again a TAKE1 type of callback. It recovers the content handler name from its third argument and stores it in the $cfg data structure under the key handler. AddHandler() is an ITERATE2 callback which receives the name of a content handler and a file extension as its third and fourth arguments. The callback stuffs this information into an anonymous hash maintained in $cfg under the handlers key.

sub ForceType ($$$) {
   my($cfg, $parms, $type) = @_;
   $cfg->{'type'} = $type;
}

The ForceType directive is used to force all documents in a path to be a particular MIME type, regardless of their file extensions. It's often used within a <Directory> section to force the type of all documents contained within and is helpful for dealing with legacy documents that don't have informative file extensions. The ForceType() callback uses a TAKE1 syntax in which the required argument is a MIME type. The callback recovers the MIME type and stores it in the $cfg hash reference under the key type.

sub AddLanguage ($$@;@) {
   my($cfg, $parms, $language, $ext) = @_;
   $ext =~ s/^\.//;
   $cfg->{'language_types'}->{$ext} = lc $language;
}

The last directive handler, AddLanguage(), implements the AddLanguage directive, in which a series of file extensions are associated with a language code (e.g., "fr" for French, "en" for English). It is an ITERATE2 callback and works just like AddHandler(), except that the dot is stripped off the file extension before storing it into the $cfghash. This is because of an old inconsistency in the way that mod_mime works, in which the AddLanguage directive expects dots in front of the file extensions, while the AddType and AddHandler directives do not.

Now we turn our attention to the handler() subroutine itself. This code will be called at request time during the MIME type checking phase. It has five responsibilities:

  1. Guess the MIME content type for the requested document.
  2. Guess the content encoding for the requested document.
  3. Guess the content language for the requested document.
  4. Set the content handler for the request.
  5. If the requested document is a directory, initiate special directory processing.

Items 1 through 3 are important but not critical. The content type, encoding, and language may well be changed during the response phase by the content handler. In particular, the MIME type is very frequently changed (e.g., by CGI scripts). Item 4, however, is crucial since it determines what code will be invoked to respond to the request. It is also necessary to detect and treat requests for directory names specially, using a pseudo-MIME type to initiate Apache's directory handling.

sub handler {
   my $r = shift;
    if(-d $r->finfo) {
      $r->content_type(DIR_MAGIC_TYPE);
      return OK;
   }

handler() begins by shifting the Apache request object off the subroutine stack. The subroutine now does a series of checks on the requested document. First, it checks whether $r->finfo() refers to a directory. If so, then handler() sets the request content type to a pseudo-MIME type defined by the constant DIR_MAGIC_TYPE and exits. Returning DIR_MAGIC_TYPE signals Apache that the user requested a directory, causing the server to pass control to any content handlers that list this constant among the MIME types they handle. mod_dir and mod_autoindex are two of the standard modules that are capable of generating directory listings.

    my($type, @encoding) = guess_media_type($r->filename);
   $r->content_type($type) if $type;
   unshift @encoding, $r->content_encoding if $r->content_encoding;
   $r->content_encoding(join ", ", @encoding) if @encoding;

If the file is not a directory, then we try to guess its MIME type and encoding. We call on the LWP::MediaTypes function guess_media_type() to do the work, passing it the filename and receiving a MIME type and list of encodings in return. Although unusual, it is theoretically possible for a file to have multiple encodings, and LWP::MediaTypes allows this. The returned type is immediately used to set the MIME type of the requested document by calling the request object's content_type() method. Likewise, the list of encodings is added to the request using content_encoding() after joining them together into a comma-delimited string. The only subtlety here is that we honor any previously defined encoding for the requested document by adding it to the list of encodings returned by guess_media_type(). This is in case the handler for a previous phase happened to add some content encoding.

Now comes some processing that depends on the values in the configuration hash, so we recover the $cfg variable by calling Apache::ModuleConfig's get() method:

    my $cfg = Apache::ModuleConfig->get($r);

The next task is to parse out the requested file's extensions and use them to set the file's MIME type and/or language.

    for my $ext (LWP::MediaTypes::file_exts($r->filename)) {
      if(my $type = $cfg->{'language_types'}->{$ext}) {
          my $ltypes = $r->content_languages;
          push @$ltypes, $type;
          $r->content_languages($ltypes);
      }

Using the LWP::MediaTypes function file_exts(), we split out all the extensions in the requested document's filename and loop through them. This allows a file named travel.html.fr to be recognized and dealt with appropriately.

We first check whether the extension matches one of the extensions in the configuration object's language_types key. If so, we use the extension to set the language code for the document. Although it is somewhat unusual, the HTTP specification allows a document to specify multiple languages in its Content-Language field, so we go to some lengths to merge multiple language codes into one long list which we then set with the request object's content_languages() method.

       if(my $type = $cfg->{'handlers'}->{$ext} and !$r->proxyreq) {
           $r->handler($type);
       }
    }

While still in the loop, we deal with the content handler for the request. We check whether the extension is among the ones defined in the configuration variable's handlers hash. If so, we call the request object's handler() method to set the content handler to the indicated value. The only catch is that if the current transaction is a proxy request, we do not want to alter the content handler because another module may have set the content handler during the URI translation phase.

    $r->content_type($cfg->{'type'}) if $cfg->{'type'};        
$r->handler($cfg->{'handler'}) if $cfg->{'handler'};

After looping through the file extensions, we handle the ForceType and SetHandler directives, which have the effect of overriding file extensions. If the configuration key type is nonempty, we use it to force the MIME type to the specified value. Likewise, if handler is nonempty, we again call handler(), replacing whatever content handler was there before.

    return OK;
}

At the end of handler(), we return OK to tell Apache that the MIME type checking phase has been handled successfully.

Although this module was presented mainly as an exercise, with minimal work it can be used to improve on mod_mime. For example, you might have noticed that the standard mod_mime has no ForceEncoding or ForceLanguage directives that allow you to override the file extension mappings in the way that you can with ForceType. This is easy enough to fix in Apache::MIME by adding the appropriate directive definitions and callbacks.

Example 8-4. Apache::MIME Reimplements the Standard mod_mime Module

package Apache::MIME;
# File: Apache/MIME.pm
use strict;
use vars qw($VERSION @ISA);
use LWP::MediaTypes qw(read_media_types guess_media_type add_type add_encoding);    
use DynaLoader (); use Apache (); use Apache::ModuleConfig (); use Apache::Constants qw(:common DIR_MAGIC_TYPE DECLINE_CMD);
@ISA = qw(DynaLoader);
$VERSION = '0.01';
if($ENV{MOD_PERL}) {
  no strict;
  @ISA = qw(DynaLoader);
  __PACKAGE__->bootstrap($VERSION);
}
sub handler {
   my $r = shift;
    if(-d $r->finfo) {
      $r->content_type(DIR_MAGIC_TYPE);
      return OK;
   }
    my($type, @encoding) = guess_media_type($r->filename);
   $r->content_type($type) if $type;
   unshift @encoding, $r->content_encoding if $r->content_encoding;
   $r->content_encoding(join ", ", @encoding) if @encoding;
    my $cfg = Apache::ModuleConfig->get($r);
    for my $ext (LWP::MediaTypes::file_exts($r->filename)) {
      if(my $type = $cfg->{'language_types'}->{$ext}) {
          my $ltypes = $r->content_languages;
          push @$ltypes, $type;
          $r->content_languages($ltypes);
      }
       if(my $type = $cfg->{'handlers'}->{$ext} and !$r->proxyreq) {
           $r->handler($type);
       }
    }
    $r->content_type($cfg->{'type'}) if $cfg->{'type'};
   $r->handler($cfg->{'handler'}) if $cfg->{'handler'};
   return OK;
}
sub TypesConfig ($$$) {
   my($cfg, $parms, $file) = @_;
   my $types_config = Apache->server_root_relative($file);
   read_media_types($types_config);
   #to co-exist with mod_mime.c
   return DECLINE_CMD if Apache->module("mod_mime.c");
}
sub AddType ($$@;@) {
   my($cfg, $parms, $type, $ext) = @_;
   add_type($type, $ext);
}
sub AddEncoding ($$@;@) {
   my($cfg, $parms, $enc, $ext) = @_;
   add_encoding($enc, $ext);
}
sub SetHandler ($$$) {
   my($cfg, $parms, $handler) = @_;
   $cfg->{'handler'} = $handler;
}
sub AddHandler ($$@;@) {
   my($cfg, $parms, $handler, $ext) = @_;
   $cfg->{'handlers'}->{$ext} = $handler;
}
sub ForceType ($$$) {
   my($cfg, $parms, $type) = @_;
   $cfg->{'type'} = $type;
}
sub AddLanguage ($$@;@) {
   my($cfg, $parms, $language, $ext) = @_;
   $ext =~ s/^\.//;
   $cfg->{'language_types'}->{$ext} = lc $language;
}
1;
__END__

   Show Contents   Previous Page   Next Page
Copyright © 1999 by O'Reilly & Associates, Inc.