home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


CONTENTS

Chapter 21. Writing Apache Modules

One of the great things about Apache is that if you don't like what it does, you can change it. Now, this is actually true for any package with source code available, but Apache makes this easier. It has a generalized interface to modules that extends the functionality of the base package. In fact, when you download Apache, you get far more than just the base package, which is barely capable of serving files at all. You get all the modules the Apache Group considers vital to a web server. You also get modules that are useful enough to most people to be worth the effort of the Group to maintain them. In this chapter, we explore the intricacies of programming modules for Apache.[1] We expect you to be thoroughly conversant with C and Unix (or Win32), because we are not going to explain anything about them. Refer to Chapter 20 or your Unix/Win32 manuals for information about functions used in the examples. We start out by explaining how to write a module for both Apache 1.3 and 2.0. We also explain how to port a 1.3 module to Apache v2.0.

21.1 Overview

Perhaps the most important part of an Apache module is the module structure. This is defined in http_config.h, so all modules should start (apart from copyright notices, etc.) with the following lines:

#include "httpd.h"
#include "http_config.h"

Note that httpd.h is required for all Apache source code.

What is the module structure for? Simple: it provides the glue between the Apache core and the module's code. It contains pointers (to functions, lists, and so on) that are used by components of the core at the correct moments. The core knows about the various module structures because they are listed in modules.c, which is generated by the Configure script from the Configuration file.[2]

Traditionally, each module ends with its module structure. Here is a particularly trivial example, from mod_asis.c (1.3):

module asis_module = {
   STANDARD_MODULE_STUFF,
   NULL,                          /* initializer */
   NULL,                          /* create per-directory config structure */
   NULL,                          /* merge per-directory config structures */
   NULL,                          /* create per-server config structure */
   NULL,                          /* merge per-server config structures */
   NULL,                          /* command table */
   asis_handlers,                 /* handlers */
   NULL,                          /* translate_handler */
   NULL,                          /* check_user_id */
   NULL,                          /* check auth */
   NULL,                          /* check access */
   NULL,                          /* type_checker */
   NULL,                          /* prerun fixups */
   NULL                           /* logger */
   NULL,                          /* header parser */
   NULL,                          /* child_init */
   NULL,                          /* child_exit */
   NULL                           /* post read request */
};

The first entry, STANDARD_MODULE_STUFF, must appear in all module structures. It initializes some structure elements that the core uses to manage modules. Currently, these are the API version number,[3] the index of the module in various vectors, the name of the module (actually, its filename), and a pointer to the next module structure in a linked list of all modules.[4]

The only other entry is for handlers. We will look at this in more detail further on. Suffice it to say, for now, that this entry points to a list of strings and functions that define the relationship between MIME or handler types and the functions that handle them. All the other entries are defined to NULL, which simply means that the module does not use those particular hooks.

The equivalent structure in 2.0 looks like this:

static void register_hooks(apr_pool_t *p)
{
    ap_hook_handler(asis_handler,NULL,NULL,APR_HOOK_MIDDLE);
}

module AP_MODULE_DECLARE_DATA asis_module =
{
    STANDARD20_MODULE_STUFF,
    NULL,			/* create per-directory config structure */
    NULL,			/* merge per-directory config structures */
    NULL,			/* create per-server config structure */
    NULL,			/* merge per-server config structures */
    NULL,			/* command apr_table_t */
    register_hooks			/* register hooks */
};

Note that we have to show the register_hooks( ) function to match the functionality of the 1.3 module structure. Once more, STANDARD20_MODULE_STUFF is required for all module structures, and the register_hooks( ) function replaces most of the rest of the old 1.3 structure. How this works is explained in detail in the next section.

21.2 Status Codes

The HTTP 1.1 standard defines many status codes that can be returned as a response to a request. Most of the functions involved in processing a request return OK, DECLINED, or a status code. DECLINED generally means that the module is not interested in processing the request; OK means it did process it, or that it is happy for the request to proceed, depending on which function was called. Generally, a status code is simply returned to the user agent, together with any headers defined in the request structure's headers_out table. At the time of writing, the status codes predefined in httpd.h were as follows:

#define HTTP_CONTINUE                      100
#define HTTP_SWITCHING_PROTOCOLS           101
#define HTTP_OK                            200
#define HTTP_CREATED                       201
#define HTTP_ACCEPTED                      202
#define HTTP_NON_AUTHORITATIVE             203
#define HTTP_NO_CONTENT                    204
#define HTTP_RESET_CONTENT                 205
#define HTTP_PARTIAL_CONTENT               206
#define HTTP_MULTIPLE_CHOICES              300
#define HTTP_MOVED_PERMANENTLY             301
#define HTTP_MOVED_TEMPORARILY             302
#define HTTP_SEE_OTHER                     303
#define HTTP_NOT_MODIFIED                  304
#define HTTP_USE_PROXY                     305
#define HTTP_BAD_REQUEST                   400
#define HTTP_UNAUTHORIZED                  401
#define HTTP_PAYMENT_REQUIRED              402
#define HTTP_FORBIDDEN                     403
#define HTTP_NOT_FOUND                     404
#define HTTP_METHOD_NOT_ALLOWED            405
#define HTTP_NOT_ACCEPTABLE                406
#define HTTP_PROXY_AUTHENTICATION_REQUIRED 407
#define HTTP_REQUEST_TIME_OUT              408
#define HTTP_CONFLICT                      409
#define HTTP_GONE                          410
#define HTTP_LENGTH_REQUIRED               411
#define HTTP_PRECONDITION_FAILED           412
#define HTTP_REQUEST_ENTITY_TOO_LARGE      413
#define HTTP_REQUEST_URI_TOO_LARGE         414
#define HTTP_UNSUPPORTED_MEDIA_TYPE        415
#define HTTP_INTERNAL_SERVER_ERROR         500
#define HTTP_NOT_IMPLEMENTED               501
#define HTTP_BAD_GATEWAY                   502
#define HTTP_SERVICE_UNAVAILABLE           503
#define HTTP_GATEWAY_TIME_OUT              504
#define HTTP_VERSION_NOT_SUPPORTED         505
#define HTTP_VARIANT_ALSO_VARIES           506

For backward compatibility, these are also defined:

#define DOCUMENT_FOLLOWS    HTTP_OK
#define PARTIAL_CONTENT     HTTP_PARTIAL_CONTENT
#define MULTIPLE_CHOICES    HTTP_MULTIPLE_CHOICES
#define MOVED               HTTP_MOVED_PERMANENTLY
#define REDIRECT            HTTP_MOVED_TEMPORARILY
#define USE_LOCAL_COPY      HTTP_NOT_MODIFIED
#define BAD_REQUEST         HTTP_BAD_REQUEST
#define AUTH_REQUIRED       HTTP_UNAUTHORIZED
#define FORBIDDEN           HTTP_FORBIDDEN
#define NOT_FOUND           HTTP_NOT_FOUND
#define METHOD_NOT_ALLOWED  HTTP_METHOD_NOT_ALLOWED
#define NOT_ACCEPTABLE      HTTP_NOT_ACCEPTABLE
#define LENGTH_REQUIRED     HTTP_LENGTH_REQUIRED
#define PRECONDITION_FAILED HTTP_PRECONDITION_FAILED
#define SERVER_ERROR        HTTP_INTERNAL_SERVER_ERROR
#define NOT_IMPLEMENTED     HTTP_NOT_IMPLEMENTED
#define BAD_GATEWAY         HTTP_BAD_GATEWAY
#define VARIANT_ALSO_VARIES HTTP_VARIANT_ALSO_VARIES

Details of the meaning of these codes are left to the HTTP 1.1 specification, but there are a couple worth mentioning here. HTTP_OK (formerly known as DOCUMENT_FOLLOWS) should not normally be used, because it aborts further processing of the request. HTTP_MOVED_TEMPORARILY (formerly known as REDIRECT) causes the browser to go to the URL specified in the Location header. HTTP_NOT_MODIFIED (formerly known as USE_LOCAL_COPY) is used in response to a header that makes a GET conditional (e.g., If-Modified-Since).

21.3 The Module Structure

Now we will look in detail at each entry in the module structure. We examine the entries in the order in which they are used, which is not the order in which they appear in the structure, and we also show how they are used in the standard Apache modules. We will also note the differences between versions 1.3 and 2.0 of Apache as we go along.

Create Per-Server Config Structure  

void *module_create_svr_config(pool *pPool, server_rec *pServer)
 

This structure creates the per-server configuration structure for the module. It is called once for the main server and once per virtual host. It allocates and initializes the memory for the per-server configuration and returns a pointer to it. pServer points to the server_rec for the current server. See Example 21-1 (1.3) for an excerpt from mod_cgi.c.

Example

Example 21-1. mod_cgi.c
#define DEFAULT_LOGBYTES 10385760
#define DEFAULT_BUFBYTES 1024

typedef struct {
    char *logname;
    long logbytes;
    int bufbytes;
} cgi_server_conf;

static void *create_cgi_config(pool *p, server_rec *s)
{
    cgi_server_conf *c =
    (cgi_server_conf *) ap_pcalloc(p, sizeof(cgi_server_conf));

    c->logname = NULL;
    c->logbytes = DEFAULT_LOGBYTES;
    c->bufbytes = DEFAULT_BUFBYTES;

    return c;
}

All this code does is allocate and initialize a copy of cgi_server_conf, which gets filled in during configuration.

The only changes for 2.0 in this are that pool becomes apr_pool_t and ap_pcalloc( ) becomes apr_pcalloc( ).

Create Per-Directory Config Structure  

void *module_create_dir_config(pool *pPool,char *szDir)
 

This structure is called once per module, with szDir set to NULL, when the main host's configuration is initialized and again for each <Directory>, <Location>, or <File> section in the Config files containing a directive from this module, with szPath set to the directory. Any per-directory directives found outside <Directory>, <Location>, or <File> sections end up in the NULL configuration. It is also called when .htaccess files are parsed, with the name of the directory in which they reside. Because this function is used for .htaccess files, it may also be called after the initializer is called. Also, the core caches per-directory configurations arising from .htaccess files for the duration of a request, so this function is called only once per directory with an .htaccess file.

If a module does not support per-directory configuration, any directives that appear in a <Directory> section override the per-server configuration unless precautions are taken. The usual way to avoid this is to set the req _overrides member appropriately in the command table — see later in this section.

The purpose of this function is to allocate and initialize the memory required for any per-directory configuration. It returns a pointer to the allocated memory. See Example 21-2 (1.3) for an excerpt from mod_rewrite.c.

Example

Example 21-2. mod_rewrite.c
static void *config_perdir_create(pool *p, char *path)
{
    rewrite_perdir_conf *a;

    a = (rewrite_perdir_conf *)ap_pcalloc(p, sizeof(rewrite_perdir_conf));

    a->state           = ENGINE_DISABLED;
    a->options         = OPTION_NONE;
    a->baseurl         = NULL;
    a->rewriteconds    = ap_make_array(p, 2, sizeof(rewritecond_entry));
    a->rewriterules    = ap_make_array(p, 2, sizeof(rewriterule_entry));

    if (path == NULL) {
        a->directory = NULL;
    }
    else {
        /* make sure it has a trailing slash */
        if (path[strlen(path)-1] == '/') {
            a->directory = ap_pstrdup(p, path);
        }
        else {
            a->directory = ap_pstrcat(p, path, "/", NULL);
        }
    }

    return (void *)a;
}

This function allocates memory for a rewrite_ perdir_conf structure (defined elsewhere in mod_rewrite.c) and initializes it. Since this function is called for every <Directory> section, regardless of whether it contains any rewriting directives, the initialization makes sure the engine is disabled unless specifically enabled later.

The only changes for 2.0 in this are that pool becomes apr_pool_t and ap_pcalloc( ) becomes apr_pcalloc( ).

Pre-Config (2.0)  

int module_pre_config(apr_pool_t *pconf,apr_pool_t *plog,apr_pool_t *ptemp)
 

This is nominally called before configuration starts, though in practice the directory and server creators are first called once each (for the default server and directory). A typical use of this function is, naturally enough, for initialization. Example 21-3 shows what mod_headers.c uses to initialize a hash.

Example

Example 21-3. mod_headers.c
static void register_format_tag_handler(apr_pool_t *p, char *tag,
                                        void *tag_handler, int def)
{
    const void *h = apr_palloc(p, sizeof(h));
    h = tag_handler;
    apr_hash_set(format_tag_hash, tag, 1, h);
}
static int header_pre_config(apr_pool_t *p, apr_pool_t *plog, apr_pool_t *ptemp)
{
    format_tag_hash = apr_hash_make(p);
    register_format_tag_handler(p, "D", (void*) header_request_duration, 0);
    register_format_tag_handler(p, "t", (void*) header_request_time, 0);
    register_format_tag_handler(p, "e", (void*) header_request_env_var, 0);

    return OK;
}
Per-Server Merger  

void *module_merge_server(pool *pPool, void *base_conf, void *new_conf)
 

Once the Config files have been read, this function is called once for each virtual host, with base_conf pointing to the main server's configuration (for this module) and new_conf pointing to the virtual host's configuration. This gives you the opportunity to inherit any unset options in the virtual host from the main server or to merge the main server's entries into the virtual server, if appropriate. It returns a pointer to the new configuration structure for the virtual host (or it just returns new_conf, if appropriate).

It is possible that future changes to Apache will allow merging of hosts other than the main one, so don't rely on base_conf pointing to the main server. See Example 21-4 (1.3) for an excerpt from mod_cgi.c.

Example

Example 21-4. mod_cgi.c
static void *merge_cgi_config(pool *p, void *basev, void *overridesv)
{
    cgi_server_conf *base = (cgi_server_conf *) basev, *overrides = (cgi_server_conf *) 
overridesv;

    return overrides->logname ? overrides : base;
}

Although this example is exceedingly trivial, a per-server merger can, in principle, do anything a per-directory merger does — it's just that in most cases it makes more sense to do things per-directory, so the interesting examples can be found there. This example does serve to illustrate a point of confusion — often the overriding configuration is called overrides (or some variant thereof), which to our ears implies the exact opposite precedence to that desired.

Again, the only change in 2.0 is that pool has become apr_pool_t.

Per-Directory Merger  

void *module_dir_merge(pool *pPool, void *base_conf, void *new_conf)
 

Like the per-server merger, this is called once for each virtual host (not for each directory). It is handed the per-server document root per-directory Config (that is, the one that was created with a NULL directory name).

Whenever a request is processed, this function merges all relevant <Directory> sections and then merges .htacess files (interleaved, starting at the root and working downward), then <File> and <Location> sections, in that order.

Unlike the per-server merger, per-directory merger is called as the server runs, possibly with different combinations of directory, location, and file configurations for each request, so it is important that it copies the configuration (in new_conf) if it is going to change it.

Now the reason we chose mod_rewrite.c for the per-directory creator becomes apparent, as it is a little more interesting than most. See Example 21-5.

Example

Example 21-5. mod_rewrite.c
static void *config_perdir_merge(pool *p, void *basev, void *overridesv)
{
    rewrite_perdir_conf *a, *base, *overrides;
    a     = (rewrite_perdir_conf *)pcalloc(p, sizeof(rewrite_perdir_conf));
    base  = (rewrite_perdir_conf *)basev;
    overrides = (rewrite_perdir_conf *)overridesv;

    a->state           = overrides->state;
    a->options         = overrides->options;
    a->directory       = overrides->directory;
    a->baseurl         = overrides->baseurl;
    if (a->options & OPTION_INHERIT) {
        a->rewriteconds = append_arrays(p, overrides->rewriteconds, 
             base->rewriteconds);
        a->rewriterules = append_arrays(p, overrides->rewriterules, 
             base->rewriterules);
    }
    else {
        a->rewriteconds = overrides->rewriteconds;
        a->rewriterules = overrides->rewriterules;
    }
    return (void *)a;
}

As you can see, this merges the configuration from the base conditionally, depending on whether the new configuration specified an INHERIT option.

Once more, the only change in 2.0 is that pool has become apr_pool_t. See Example 21-6 for an excerpt from mod_env.c.

Example 21-6. mod_env.c
static void *merge_env_dir_configs(pool *p, void *basev, void *addv)
{
    env_dir_config_rec *base = (env_dir_config_rec *) basev;
    env_dir_config_rec *add = (env_dir_config_rec *) addv;
    env_dir_config_rec *new =
    (env_dir_config_rec *) ap_palloc(p, sizeof(env_dir_config_rec));
    table *new_table;
    table_entry *elts;
    array_header *arr;
    int i;
    const char *uenv, *unset;

    new_table = ap_copy_table(p, base->vars);

    arr = ap_table_elts(add->vars);
    elts = (table_entry *)arr->elts;

    for (i = 0; i < arr->nelts; ++i) {
        ap_table_setn(new_table, elts[i].key, elts[i].val);
    }

    unset = add->unsetenv;
    uenv = ap_getword_conf(p, &unset);
    while (uenv[0] != '\0') {
        ap_table_unset(new_table, uenv);
        uenv = ap_getword_conf(p, &unset);
    }

    new->vars = new_table;

    new->vars_present = base->vars_present || add->vars_present;

    return new;
}

This function creates a new configuration into which it then copies the base vars table (a table of environment variable names and values). It then runs through the individual entries of the addv vars table, setting them in the new table. It does this rather than use overlay_tables( ) because overlay_tables( ) does not deal with duplicated keys. Then the addv configuration's unsetenv (which is a space-separated list of environment variables to unset) unsets any variables specified to be unset for addv 's server.

The 2.0 version of this function has a number of alterations, but on close inspection is actually very much the same, allowing for differences in function names and some rather radical restructuring:

static void *merge_env_dir_configs(apr_pool_t *p, void *basev, void *addv)
{
    env_dir_config_rec *base = basev;
    env_dir_config_rec *add = addv;
    env_dir_config_rec *res = apr_palloc(p, sizeof(*res));
    const apr_table_entry_t *elts;
    const apr_array_header_t *arr;
    int i;

    res->vars = apr_table_copy(p, base->vars);
    res->unsetenv = NULL;

    arr = apr_table_elts(add->unsetenv);
    elts = (const apr_table_entry_t *)arr->elts;

    for (i = 0; i < arr->nelts; ++i) {
        apr_table_unset(res->vars, elts[i].key);
    }

    arr = apr_table_elts(add->vars);
    elts = (const apr_table_entry_t *)arr->elts;

    for (i = 0; i < arr->nelts; ++i) {
        apr_table_setn(res->vars, elts[i].key, elts[i].val);
    }

    return res;
}
Command Table  

command_rec aCommands[]
 

This structure points to an array of directives that configure the module. Each entry names a directive, specifies a function that will handle the command, and specifies which AllowOverride directives must be in force for the command to be permitted. Each entry then specifies how the directive's arguments are to be parsed and supplies an error message in case of syntax errors (such as the wrong number of arguments, or a directive used where it shouldn't be).

The definition of command_rec can be found in http_config.h:

typedef struct command_struct {
  const char *name;          /* Name of this command */
  const char *(*func)( );     /* Function invoked */
  void *cmd_data;            /* Extra data, for functions that
                              * implement multiple commands...
                              */
  int req_override;          /* What overrides need to be allowed to
                              * enable this command
                              */
  enum cmd_how args_how;     /* What the command expects as arguments */
  
  const char *errmsg;        /* 'usage' message, in case of syntax errors */
} command_rec;

Note that in 2.0 this definition is still broadly correct, but there's also a variant for compilers that allow designated initializers to permit the type-safe initialization of command_recs.

cmd_how is defined as follows:

enum cmd_how {
  RAW_ARGS,                     /* cmd_func parses command line itself */
  TAKE1,                        /* one argument only */
  TAKE2,                        /* two arguments only */
  ITERATE,                      /* one argument, occurring multiple times
                                 * (e.g., IndexIgnore)
                                 */
  ITERATE2,                     /* two arguments, 2nd occurs multiple times
                                 * (e.g., AddIcon)
                                 */
  FLAG,                         /* One of 'On' or 'Off' */
  NO_ARGS,                      /* No args at all, e.g. </Directory> */
  TAKE12,                       /* one or two arguments */
  TAKE3,                        /* three arguments only */
  TAKE23,                       /* two or three arguments */
  TAKE123,                      /* one, two, or three arguments */
  TAKE13                        /* one or three arguments */
};

These options determine how the function func is called when the matching directive is found in a Config file, but first we must look at one more structure, cmd_parms:

typedef struct {
    void *info;                 /* Argument to command from cmd_table */
    int override;               /* Which allow-override bits are set */
    int limited;                /* Which methods are <Limit>ed */

    configfile_t *config_file;  /* Config file structure from pcfg_openfile( ) */

    ap_pool *pool;              /* Pool to allocate new storage in */
    struct pool *temp_pool;     /* Pool for scratch memory; persists during
                                 * configuration, but wiped before the first
                                 * request is served...
                                 */
    server_rec *server;         /* Server_rec being configured for */
    char *path;                 /* If configuring for a directory,
                                 * pathname of that directory.
                                 * NOPE!  That's what it meant previous to the
                                 * existance of <Files>, <Location> and regex
                                 * matching.  Now the only usefulness that can
                                 * be derived from this field is whether a command
                                 * is being called in a server context (path == NULL)
                                 * or being called in a dir context (path != NULL).
                                 */
    const command_rec *cmd;     /* configuration command */
    const char *end_token;      /* end token required to end a nested section */
    void *context;              /* per_dir_config vector passed 
                                 * to handle_command */
} cmd_parms;

This structure is filled in and passed to the function associated with each directive. Note that cmd_parms.info is filled in with the value of command_rec.cmd_data, allowing arbitrary extra information to be passed to the function. The function is also passed its per-directory configuration structure, if there is one, shown in the following function definitions as mconfig. The per-server configuration can be accessed by a call similar to:

ap_get_module_config(parms->server->module_config, &module_struct)

replacing module_struct with your own module's module structure. Extra information may also be passed, depending on the value of args_how :

RAW_ARGS

func(cmd_parms *parms, void *mconfig, char *args)

args is simply the rest of the line (that is, excluding the directive).

NO_ARGS

func(cmd_parms *parms, void *mconfig)

TAKE1

func(cmd_parms *parms, void *mconfig, char *w)

w is the single argument to the directive.

TAKE2, TAKE12

func(cmd_parms *parms, void *mconfig, char *w1, char *w2)

w1 and w2 are the two arguments to the directive. TAKE12 means the second argument is optional. If absent, w2 is NULL.

TAKE3, TAKE13, TAKE23, TAKE123

func(cmd_parms *parms, void *mconfig, char *w1, char *w2, char *w3)

w1, w2, and w3 are the three arguments to the directive. TAKE13, TAKE23, and TAKE123 mean that the directive takes one or three, two or three, and one, two, or three arguments, respectively. Missing arguments are NULL.

ITERATE

func(cmd_parms *parms, void *mconfig, char *w)

func is called repeatedly, once for each argument following the directive.

ITERATE2

func(cmd_parms *parms, void *mconfig, char *w1, char *w2)

There must be at least two arguments. func is called once for each argument, starting with the second. The first is passed to func every time.

FLAG

func(cmd_parms *parms, void *mconfig, int f)

The argument must be either On or Off. If On, then f is nonzero; if Off, f is zero.

In 2.0 each of the previous has its own macro to define it, to allow for type-safe initialization where supported by the compiler entries. So instead of directly using the flag ITERATE, for example, you would instead use the macro AP_INIT_ITERATE to fill in the command_rec structure.

req_override can be any combination of the following (ORed together):

#define OR_NONE 0
#define OR_LIMIT 1
#define OR_OPTIONS 2
#define OR_FILEINFO 4
#define OR_AUTHCFG 8
#define OR_INDEXES 16
#define OR_UNSET 32
#define ACCESS_CONF 64
#define RSRC_CONF 128
#define OR_ALL (OR_LIMIT|OR_OPTIONS|OR_FILEINFO|OR_AUTHCFG|OR_INDEXES)

2.0 adds one extra option:

#define EXEC_ON_READ 256     /**< force directive to execute a command 
                             which would modify the configuration (like including
                             another file, or IFModule */

This flag defines the circumstances under which a directive is permitted. The logical AND of this field and the current override state must be nonzero for the directive to be allowed. In configuration files, the current override state is:

RSRC_CONF|OR_OPTIONS|OR_FILEINFO|OR_INDEXES

when outside a <Directory> section, and it is:

ACCESS_CONF|OR_LIMIT|OR_OPTIONS|OR_FILEINFO|OR_AUTHCFG|OR_INDEXES

when inside a <Directory> section.

In .htaccess files, the state is determined by the AllowOverride directive. See Example 21-7 (1.3) for an excerpt from mod_mime.c.

Example

Example 21-7. mod_mime.c
static const command_rec mime_cmds[] =
{
    {"AddType", add_type, NULL, OR_FILEINFO, ITERATE2,
     "a mime type followed by one or more file extensions"},
    {"AddEncoding", add_encoding, NULL, OR_FILEINFO, ITERATE2,
     "an encoding (e.g., gzip), followed by one or more file extensions"},
    {"AddCharset", add_charset, NULL, OR_FILEINFO, ITERATE2,
     "a charset (e.g., iso-2022-jp), followed by one or more file extensions"},
    {"AddLanguage", add_language, NULL, OR_FILEINFO, ITERATE2,
     "a language (e.g., fr), followed by one or more file extensions"},
    {"AddHandler", add_handler, NULL, OR_FILEINFO, ITERATE2,
     "a handler name followed by one or more file extensions"},
    {"ForceType", ap_set_string_slot_lower, 
     (void *)XtOffsetOf(mime_dir_config, type), OR_FILEINFO, TAKE1, 
     "a media type"},
    {"RemoveHandler", remove_handler, NULL, OR_FILEINFO, ITERATE,
     "one or more file extensions"},
    {"RemoveEncoding", remove_encoding, NULL, OR_FILEINFO, ITERATE,
     "one or more file extensions"},
    {"RemoveType", remove_type, NULL, OR_FILEINFO, ITERATE,
     "one or more file extensions"},
    {"SetHandler", ap_set_string_slot_lower, 
     (void *)XtOffsetOf(mime_dir_config, handler), OR_FILEINFO, TAKE1, 
     "a handler name"},
    {"TypesConfig", set_types_config, NULL, RSRC_CONF, TAKE1,
     "the MIME types config file"},
    {"DefaultLanguage", ap_set_string_slot,
     (void*)XtOffsetOf(mime_dir_config, default_language), OR_FILEINFO, TAKE1,
     "language to use for documents with no other language file extension" },
    {NULL}
};

Note the use of set_string_slot( ). This standard function uses the offset defined in cmd_data, using XtOffsetOf to set a char* in the per-directory configuration of the module. See Example 21-8 (2.0) for an excerpt from mod_mime.c.

Example 21-8. mod_mime.c
static const command_rec mime_cmds[] =
{
AP_INIT_ITERATE2("AddCharset", add_extension_info, 
         (void *)APR_XtOffsetOf(extension_info, charset_type), OR_FILEINFO,
     "a charset (e.g., iso-2022-jp), followed by one or more file extensions"),
AP_INIT_ITERATE2("AddEncoding", add_extension_info, 
         (void *)APR_XtOffsetOf(extension_info, encoding_type), OR_FILEINFO,
     "an encoding (e.g., gzip), followed by one or more file extensions"),
AP_INIT_ITERATE2("AddHandler", add_extension_info, 
         (void *)APR_XtOffsetOf(extension_info, handler), OR_FILEINFO,
     "a handler name followed by one or more file extensions"),
AP_INIT_ITERATE2("AddInputFilter", add_extension_info, 
         (void *)APR_XtOffsetOf(extension_info, input_filters), OR_FILEINFO,
     "input filter name (or ; delimited names) followed by one or more file extensions"),
AP_INIT_ITERATE2("AddLanguage", add_extension_info, 
         (void *)APR_XtOffsetOf(extension_info, language_type), OR_FILEINFO,
     "a language (e.g., fr), followed by one or more file extensions"),
AP_INIT_ITERATE2("AddOutputFilter", add_extension_info, 
         (void *)APR_XtOffsetOf(extension_info, output_filters), OR_FILEINFO, 
     "output filter name (or ; delimited names) followed by one or more file extensions"),
AP_INIT_ITERATE2("AddType", add_extension_info, 
         (void *)APR_XtOffsetOf(extension_info, forced_type), OR_FILEINFO, 
     "a mime type followed by one or more file extensions"),
AP_INIT_TAKE1("DefaultLanguage", ap_set_string_slot,
       (void*)APR_XtOffsetOf(mime_dir_config, default_language), OR_FILEINFO,
     "language to use for documents with no other language file extension"),
AP_INIT_ITERATE("MultiviewsMatch", multiviews_match, NULL, OR_FILEINFO,
     "NegotiatedOnly (default), Handlers and/or Filters, or Any"),
AP_INIT_ITERATE("RemoveCharset", remove_extension_info, 
        (void *)APR_XtOffsetOf(extension_info, charset_type), OR_FILEINFO,
     "one or more file extensions"),
AP_INIT_ITERATE("RemoveEncoding", remove_extension_info, 
        (void *)APR_XtOffsetOf(extension_info, encoding_type), OR_FILEINFO,
     "one or more file extensions"),
AP_INIT_ITERATE("RemoveHandler", remove_extension_info, 
        (void *)APR_XtOffsetOf(extension_info, handler), OR_FILEINFO,
     "one or more file extensions"),
AP_INIT_ITERATE("RemoveInputFilter", remove_extension_info, 
        (void *)APR_XtOffsetOf(extension_info, input_filters), OR_FILEINFO,
     "one or more file extensions"),
AP_INIT_ITERATE("RemoveLanguage", remove_extension_info, 
        (void *)APR_XtOffsetOf(extension_info, language_type), OR_FILEINFO,
     "one or more file extensions"),
AP_INIT_ITERATE("RemoveOutputFilter", remove_extension_info, 
        (void *)APR_XtOffsetOf(extension_info, output_filters), OR_FILEINFO,
     "one or more file extensions"),
AP_INIT_ITERATE("RemoveType", remove_extension_info, 
        (void *)APR_XtOffsetOf(extension_info, forced_type), OR_FILEINFO,
     "one or more file extensions"),
AP_INIT_TAKE1("TypesConfig", set_types_config, NULL, RSRC_CONF,
     "the MIME types config file"),
    {NULL}
};

As you can see, this uses the macros to initialize the structure. Also note that set_string_slot( ) has become ap_set_string_slot( ).

Initializer  

void module_init(server_rec *pServer, pool *pPool) [1.3]
int module_post_config(apr_pool_t *pPool, apr_pool_t *pLog, apr_pool_t *pTemp, 
                       server_rec *pServer) [2.0]
 

In 1.3 this is the init hook, but in 2.0 it has been renamed, more accurately, to post_config.

In 2.0 the three pools provided are, in order, pPool, a pool that lasts until the configuration is changed, corresponding to pPool in 1.3; pLog, a pool that is cleared after each read of the configuration file (remembering it is read twice for each reconfiguration) intended for log files; and ptemp, a temporary pool that is cleared after configuration is complete (and perhaps more often than that).

This function is called after the server configuration files have been read but before any requests are handled. Like the configuration functions, it is called each time the server is reconfigured, so care must be taken to make sure it behaves correctly on the second and subsequent calls. This is the last function to be called before Apache forks the request-handling children. pServer is a pointer to the server_rec for the main host. pPool is a pool that persists until the server is reconfigured. Note that, at least in the current version of Apache:

pServer->server_hostname

may not yet be initialized. If the module is going to add to the version string with ap_add_version_component( ), then this is a good place to do it.

It is possible to iterate through all the server configurations by following the next member of pServer, as in the following:

for( ; pServer ; pServer=pServer->next)
    ;

See Example 21-9 (1.3) for an excerpt from mod_mime.c.

Example

Example 21-9. mod_mime.c
#define MIME_HASHSIZE (32)
#define hash(i) (ap_tolower(i) % MIME_HASHSIZE)

static table *hash_buckets[MIME_HASHSIZE];

static void init_mime(server_rec *s, pool *p)
{
    configfile_t *f;
    char l[MAX_STRING_LEN];
    int x;
    char *types_confname = ap_get_module_config(s->module_config, &mime_module);

    if (!types_confname)
        types_confname = TYPES_CONFIG_FILE;

    types_confname = ap_server_root_relative(p, types_confname);

    if (!(f = ap_pcfg_openfile(p, types_confname))) {
        ap_log_error(APLOG_MARK, APLOG_ERR, s,
"could not open mime types log file %s.", types_confname);
        exit(1);
    }

    for (x = 0; x < MIME_HASHSIZE; x++)
        hash_buckets[x] = ap_make_table(p, 10);

    while (!(ap_cfg_getline(l, MAX_STRING_LEN, f))) {
        const char *ll = l, *ct;

        if (l[0] == '#')
            continue;
        ct = ap_getword_conf(p, &ll);

        while (ll[0]) {
            char *ext = ap_getword_conf(p, &ll);
            ap_str_tolower(ext);   /* ??? */
            ap_table_setn(hash_buckets[hash(ext[0])], ext, ct);
        }
    }
    ap_cfg_closefile(f);
}

The same function in mod_mime.c uses a hash provided by APR instead of building its own, as shown in Example 21-10 (2.0).

Example 21-10. mod_mime.c
static apr_hash_t *mime_type_extensions;

static int mime_post_config(apr_pool_t *p, apr_pool_t *plog, apr_pool_t *ptemp, server_rec *s)
{
    ap_configfile_t *f;
    char l[MAX_STRING_LEN];
    const char *types_confname = ap_get_module_config(s->module_config, &mime_module);
    apr_status_t status;

    if (!types_confname)
        types_confname = AP_TYPES_CONFIG_FILE;

    types_confname = ap_server_root_relative(p, types_confname);

    if ((status = ap_pcfg_openfile(&f, ptemp, types_confname)) != APR_SUCCESS) {
        ap_log_error(APLOG_MARK, APLOG_ERR, status, s,
		     "could not open mime types config file %s.", types_confname);
        return HTTP_INTERNAL_SERVER_ERROR;
    }

    mime_type_extensions = apr_hash_make(p);

    while (!(ap_cfg_getline(l, MAX_STRING_LEN, f))) {
        const char *ll = l, *ct;

        if (l[0] == '#')
            continue;
        ct = ap_getword_conf(p, &ll);

        while (ll[0]) {
            char *ext = ap_getword_conf(p, &ll);
            ap_str_tolower(ext);   /* ??? */
            apr_hash_set(mime_type_extensions, ext, APR_HASH_KEY_STRING, ct);
        }
    }
    ap_cfg_closefile(f);
    return OK;
}
Child Initialization  

static void 
module_child_init(server_rec *pServer,pool *pPool)
 

An Apache server may consist of many processes (on Unix, for example) or a single process with many threads (on Win32) or, in the future, a combination of the two. module_child_init( ) is called once for each instance of a heavyweight process, that is, whatever level of execution corresponds to a separate address space, file handles, etc. In the case of Unix, this is once per child process, but on Win32 it is called only once in total, not once per thread. This is because threads share address space and other resources. There is not currently a corresponding per-thread call, but there may be in the future. There is a corresponding call for child exit, described later in this chapter.

See Example 21-11 (1.3) for an excerpt from mod_unique_id.c.

Example

Example 21-11. mod_unique_id.c
static void unique_id_child_init(server_rec *s, pool *p)
{
    pid_t pid;
#ifndef NO_GETTIMEOFDAY
    struct timeval tv;
#endif

    pid = getpid( );
    cur_unique_id.pid = pid;

    if (cur_unique_id.pid != pid) {
        ap_log_error(APLOG_MARK, APLOG_NOERRNO|APLOG_CRIT, s,
                    "oh no! pids are greater than 32-bits!  I'm broken!");
    }

    cur_unique_id.in_addr = global_in_addr;

#ifndef NO_GETTIMEOFDAY
    if (gettimeofday(&tv, NULL) == -1) {
        cur_unique_id.counter = 0;
    }
    else {
        cur_unique_id.counter = tv.tv_usec / 10;
    }
#else
    cur_unique_id.counter = 0;
#endif

    cur_unique_id.pid = htonl(cur_unique_id.pid);
    cur_unique_id.counter = htons(cur_unique_id.counter);
}

mod_unique_id.c 's purpose in life is to provide an ID for each request that is unique across all web servers everywhere (or, at least at a particular site). To do this, it uses various bits of uniqueness, including the process ID of the child and the time at which it was forked, which is why it uses this hook.

The same function in 2.0 is a little simpler, because APR takes away the platform dependencies:

static void unique_id_child_init(apr_pool_t *p, server_rec *s)
{
    pid_t pid;
    apr_time_t tv;

    pid = getpid( );
    cur_unique_id.pid = pid;
    if ((pid_t)cur_unique_id.pid != pid) {
        ap_log_error(APLOG_MARK, APLOG_NOERRNO|APLOG_CRIT, 0, s,
                    "oh no! pids are greater than 32-bits!  I'm broken!");
    }
    cur_unique_id.in_addr = global_in_addr;
    tv = apr_time_now( );
    cur_unique_id.counter = (unsigned short)(tv % APR_USEC_PER_SEC / 10);
    cur_unique_id.pid = htonl(cur_unique_id.pid);
    cur_unique_id.counter = htons(cur_unique_id.counter);
}
Post Read Request  

static int module_post_read_request(request_rec *pReq)
 

This function is called immediately after the request headers have been read or, in the case of an internal redirect, synthesized. It is not called for subrequests. It can return OK, DECLINED, or a status code. If something other than DECLINED is returned, no further modules are called. This can be used to make decisions based purely on the header content. Currently, the only standard Apache module to use this hook is the proxy module.

See Example 21-12 for an excerpt from mod_proxy.c.

Example

Example 21-12. mod_proxy.c
static int proxy_detect(request_rec *r)
{
    void *sconf = r->server->module_config;
    proxy_server_conf *conf;

    conf = (proxy_server_conf *) ap_get_module_config(sconf, &proxy_module);

    if (conf->req && r->parsed_uri.scheme) {
        /* but it might be something vhosted */
       if (!(r->parsed_uri.hostname
            && !strcasecmp(r->parsed_uri.scheme, ap_http_method(r))
            && ap_matches_request_vhost(r, r->parsed_uri.hostname,
               r->parsed_uri.port_str ? r->parsed_uri.port : ap_default_port(r)))) {
            r->proxyreq = STD_PROXY;
            r->uri = r->unparsed_uri;
            r->filename = ap_pstrcat(r->pool, "proxy:", r->uri, NULL);
            r->handler = "proxy-server";
        }
    }
    /* We need special treatment for CONNECT proxying: it has no scheme part */
    else if (conf->req && r->method_number == M_CONNECT
             && r->parsed_uri.hostname
             && r->parsed_uri.port_str) {
            r->proxyreq = STD_PROXY;
            r->uri = r->unparsed_uri;
            r->filename = ap_pstrcat(r->pool, "proxy:", r->uri, NULL);
            r->handler = "proxy-server";
    }
    return DECLINED;
}

This code checks for a request that includes a hostname that does not match the current virtual host (which, since it will have been chosen on the basis of the hostname in the request, means it doesn't match any virtual host) or a CONNECT method (which only proxies use). If either of these conditions are true, the handler is set to proxy-server, and the filename is set to proxy:uri so that the later phases will be handled by the proxy module.

Apart from minor differences in naming of constants, this function is identical in 2.0.

Quick Handler (2.0)  

int module_quick_handler(request_rec *r, int lookup_uri)
 

This function is intended to provide content from a URI-based cache. If lookup_uri is set, then it should simply return OK if the URI exists, but not provide the content.

The only example of this in 2.0 is in an experimental module, mod_cache.c, as shown in Example 21-13.

Example

Example 21-13. mod_cache.c
static int cache_url_handler(request_rec *r, int lookup)
{
    apr_status_t rv;
    const char *cc_in, *pragma, *auth;
    apr_uri_t uri = r->parsed_uri;
    char *url = r->unparsed_uri;
    apr_size_t urllen;
    char *path = uri.path;
    const char *types;
    cache_info *info = NULL;
    cache_request_rec *cache;
    cache_server_conf *conf = 
        (cache_server_conf *) ap_get_module_config(r->server->module_config, 
                                                   &cache_module);

    if (r->method_number != M_GET) return DECLINED;

    if (!(types = ap_cache_get_cachetype(r, conf, path))) {
        return DECLINED;
    }
    ap_log_error(APLOG_MARK, APLOG_DEBUG | APLOG_NOERRNO, 0, r->server,
                 "cache: URL %s is being handled by %s", path, types);

    urllen = strlen(url);
    if (urllen > MAX_URL_LENGTH) {
        ap_log_error(APLOG_MARK, APLOG_DEBUG | APLOG_NOERRNO, 0, r->server,
                     "cache: URL exceeds length threshold: %s", url);
        return DECLINED;
    }
    if (url[urllen-1] == '/') {
        return DECLINED;
    }

    cache = (cache_request_rec *) ap_get_module_config(r->request_config, 
                                                       &cache_module);
    if (!cache) {
        cache = ap_pcalloc(r->pool, sizeof(cache_request_rec));
        ap_set_module_config(r->request_config, &cache_module, cache);
    }

    cache->types = types;

    cc_in = apr_table_get(r->headers_in, "Cache-Control");
    pragma = apr_table_get(r->headers_in, "Pragma");
    auth = apr_table_get(r->headers_in, "Authorization");

    if (conf->ignorecachecontrol_set == 1 && conf->ignorecachecontrol == 1 && 
        auth == NULL) {
        ap_log_error(APLOG_MARK, APLOG_DEBUG | APLOG_NOERRNO, 0, r->server,
            "incoming request is asking for a uncached version of %s,
             but we know better and are ignoring it", url);
    }
    else {
        if (ap_cache_liststr(cc_in, "no-store", NULL) ||
            ap_cache_liststr(pragma, "no-cache", NULL) || (auth != NULL)) {
            /* delete the previously cached file */
            cache_remove_url(r, cache->types, url);

            ap_log_error(APLOG_MARK, APLOG_DEBUG | APLOG_NOERRNO, 0, r->server,
                        "cache: no-store forbids caching of %s", url);
            return DECLINED;
        }
    }

    rv = cache_select_url(r, cache->types, url);
    if (DECLINED == rv) {
        if (!lookup) {
           ap_log_error(APLOG_MARK, APLOG_DEBUG | APLOG_NOERRNO, 0, r->server,
                         "cache: no cache - add cache_in filter and DECLINE");
            ap_add_output_filter("CACHE_IN", NULL, r, r->connection);
        }
        return DECLINED;
    }
    else if (OK == rv) {
        if (cache->fresh) {
            apr_bucket_brigade *out;
            conn_rec *c = r->connection;

            if (lookup) {
                return OK;
            }
            ap_log_error(APLOG_MARK, APLOG_DEBUG | APLOG_NOERRNO, 0, r->server,
                         "cache: fresh cache - add cache_out filter and "
                         "handle request");

            ap_run_insert_filter(r);
            ap_add_output_filter("CACHE_OUT", NULL, r, r->connection);
            out = apr_brigade_create(r->pool, c->bucket_alloc);
            if (APR_SUCCESS != (rv = ap_pass_brigade(r->output_filters, out))) {
                ap_log_error(APLOG_MARK, APLOG_ERR, rv, r->server,
                             "cache: error returned while trying to return %s "
                             "cached data", 
                             cache->type);
                return rv;
            }
            return OK;
        }
        else {
            if (lookup) {
                return DECLINED;
            }

            ap_log_error(APLOG_MARK, APLOG_DEBUG | APLOG_NOERRNO, 0, r->server,
                         "cache: stale cache - test conditional");
            if (ap_cache_request_is_conditional(r)) {
                ap_log_error(APLOG_MARK, APLOG_DEBUG | APLOG_NOERRNO, 0, 
                             r->server,
                             "cache: conditional - add cache_in filter and "
                             "DECLINE");

                ap_add_output_filter("CACHE_IN", NULL, r, r->connection);

                return DECLINED;
            }
           else {
                if (info && info->etag) {
                    ap_log_error(APLOG_MARK, APLOG_DEBUG | APLOG_NOERRNO, 0, 
                                 r->server,
                                 "cache: nonconditional - fudge conditional "
                                 "by etag");
                    apr_table_set(r->headers_in, "If-None-Match", info->etag);
                }
                else if (info && info->lastmods) {
                    ap_log_error(APLOG_MARK, APLOG_DEBUG | APLOG_NOERRNO, 0, 
                                 r->server,
                                 "cache: nonconditional - fudge conditional "
                                 "by lastmod");
                    apr_table_set(r->headers_in, 
                                  "If-Modified-Since", 
                                  info->lastmods);
                }
                else {
                    ap_log_error(APLOG_MARK, APLOG_DEBUG | APLOG_NOERRNO, 0, 
                                 r->server,
                                 "cache: nonconditional - no cached "
                                 "etag/lastmods - add cache_in and DECLINE");

                    ap_add_output_filter("CACHE_IN", NULL, r, r->connection);

                    return DECLINED;
                }
                ap_log_error(APLOG_MARK, APLOG_DEBUG | APLOG_NOERRNO, 0, 
                             r->server,
                             "cache: nonconditional - add cache_conditional and"
                             " DECLINE");
                ap_add_output_filter("CACHE_CONDITIONAL", 
                                     NULL, 
                                     r, 
                                     r->connection);

                return DECLINED;
            }
        }
    }
    else {
        ap_log_error(APLOG_MARK, APLOG_ERR, rv, 
                     r->server,
                     "cache: error returned while checking for cached file by "
                     "%s cache", 
                     cache->type);
        return DECLINED;
    }
}

This is quite complex, but interesting — note the use of filters both to fill the cache and to generate the cached content for cache hits.

Translate Name  

int module_translate(request_rec *pReq)
 

This function's task is to translate the URL in a request into a filename. The end result of its deliberations should be placed in pReq->filename. It should return OK, DECLINED, or a status code. The first module that doesn't return DECLINED is assumed to have done the job, and no further modules are called. Since the order in which modules are called is not defined, it is a good thing if the URLs handled by the modules are mutually exclusive. If all modules return DECLINED, a configuration error has occurred. Obviously, the function is likely to use the per-directory and per-server configurations (but note that at this stage, the per-directory configuration refers to the root configuration of the current server) to determine whether it should handle the request, as well as the URL itself (in pReq->uri). If a status is returned, the appropriate headers for the response should also be set in pReq->headers_out.

Naturally enough, Example 21-14 (1.3 and 2.0) comes from mod_alias.c:

Example

Example 21-14. mod_alias.c
static char *try_alias_list(request_rec *r, array_header *aliases, int doesc, int *status)
{
    alias_entry *entries = (alias_entry *) aliases->elts;
    regmatch_t regm[10];
    char *found = NULL;
    int i;

    for (i = 0; i < aliases->nelts; ++i) {
        alias_entry *p = &entries[i];
        int l;

        if (p->regexp) {
            if (!ap_regexec(p->regexp, r->uri, p->regexp->re_nsub + 1, regm, 0)) {
                if (p->real) {
                    found = ap_pregsub(r->pool, p->real, r->uri,
                                       p->regexp->re_nsub + 1, regm);
                    if (found && doesc) {
                        found = ap_escape_uri(r->pool, found);
                    }
                }
                else {
                    /* need something non-null */
                    found = ap_pstrdup(r->pool, "");
                }
            }
        }
        else {
            l = alias_matches(r->uri, p->fake);

            if (l > 0) {
                if (doesc) {
                    char *escurl;
                    escurl = ap_os_escape_path(r->pool, r->uri + l, 1);

                    found = ap_pstrcat(r->pool, p->real, escurl, NULL);
                }
                else
                    found = ap_pstrcat(r->pool, p->real, r->uri + l, NULL);
            }
        }

        if (found) {
            if (p->handler) {	/* Set handler, and leave a note for mod_cgi */
                r->handler = p->handler;
                ap_table_setn(r->notes, "alias-forced-type", r->handler);
            }

            *status = p->redir_status;

            return found;
        }
    }

    return NULL;
}

static int translate_alias_redir(request_rec *r)
{
    void *sconf = r->server->module_config;
    alias_server_conf *serverconf =
    (alias_server_conf *) ap_get_module_config(sconf, &alias_module);
    char *ret;
    int status;

    if (r->uri[0] != '/' && r->uri[0] != '\0')
	return DECLINED;

    if ((ret = try_alias_list(r, serverconf->redirects, 1, &status)) != NULL) {
        if (ap_is_HTTP_REDIRECT(status)) {
            /* include QUERY_STRING if any */
            if (r->args) {
                ret = ap_pstrcat(r->pool, ret, "?", r->args, NULL);
            }
            ap_table_setn(r->headers_out, "Location", ret);
        }
        return status;
    }

    if ((ret = try_alias_list(r, serverconf->aliases, 0, &status)) != NULL) {
        r->filename = ret;
        return OK;
    }

    return DECLINED;
}

First of all, this example tries to match a Redirect directive. If it does, the Location header is set in headers_out, and REDIRECT is returned. If not, it translates into a filename. Note that it may also set a handler (in fact, the only handler it can possibly set is cgi-script, which it does if the alias was created by a ScriptAlias directive). An interesting feature is that it sets a note for mod_cgi.c, namely alias-forced-type. This is used by mod_cgi.c to determine whether the CGI script is invoked via a ScriptAlias, in which case Options ExecCGI is not needed.[5] For completeness, here is the code from mod_cgi.c that makes the test:

int is_scriptaliased (request_rec *r)
{
    char *t = table_get (r->notes, "alias-forced-type");
    return t && (!strcmp (t, "cgi-script"));
}

An Interjection

At this point, the filename is known as well as the URL, and Apache reconfigures itself to hand subsequent module functions the relevant per-directory configuration (actually composed of all matching directory, location, and file configurations, merged with each other via the per-directory merger, in that order).[6]

Map to Storage (2.0)  

int module_map_to_storage(request_rec *r)
 

This function allows modules to set the request_rec's per_dir_config according to their own view of the world, if desired. It is also used to respond to contextless requests (such as TRACE). It should return DONE or an HTTP return code if a contextless request was fulfilled, OK if the module mapped it, or DECLINED if not. The core will handle this by doing a standard directory walk on the filename if no other module does. See Example 21-15.

Example

Example 21-15. http_protocol.c
AP_DECLARE_NONSTD(int) ap_send_http_trace(request_rec *r)
{
    int rv;
    apr_bucket_brigade *b;
    header_struct h;

    if (r->method_number != M_TRACE) {
        return DECLINED;
    }

    /* Get the original request */
    while (r->prev) {
        r = r->prev;
    }

    if ((rv = ap_setup_client_block(r, REQUEST_NO_BODY))) {
        return rv;
    }

    ap_set_content_type(r, "message/http");

    /* Now we recreate the request, and echo it back */

    b = apr_brigade_create(r->pool, r->connection->bucket_alloc);
    apr_brigade_putstrs(b, NULL, NULL, r->the_request, CRLF, NULL);
    h.pool = r->pool;
    h.bb = b;
    apr_table_do((int (*) (void *, const char *, const char *))
                 form_header_field, (void *) &h, r->headers_in, NULL);
    apr_brigade_puts(b, NULL, NULL, CRLF);
    ap_pass_brigade(r->output_filters, b);

    return DONE;
}

This is the code that handles the TRACE method. Also, the following is from mod_proxy.c:

static int proxy_map_location(request_rec *r)
{
    int access_status;

    if (!r->proxyreq || strncmp(r->filename, "proxy:", 6) != 0)
        return DECLINED;

    /* Don't let the core or mod_http map_to_storage hooks handle this,
     * We don't need directory/file_walk, and we want to TRACE on our own.
     */
    if ((access_status = proxy_walk(r))) {
        ap_die(access_status, r);
        return access_status;
    }

    return OK;
}
Header Parser  

int module_header_parser(request_rec *pReq)
 

This routine is similar in intent to the post_read_request phase. It can return OK, DECLINED, or a status code. If something other than DECLINED is returned, no further modules are called. The intention was to make decisions based on the headers sent by the client. However, its use has (in most cases) been superseded by post_read_request. Since it occurs after the per-directory configuration merge has been done, it is useful in some cases.

The only standard module that uses it is mod_setenvif.c, as shown in Example 21-16.

Example

Example 21-16. mod_setenvif.c
static int match_headers(request_rec *r)
{
    sei_cfg_rec *sconf;
    sei_entry *entries;
    table_entry *elts;
    const char *val;
    int i, j;
    int perdir;
    char *last_name;

    perdir = (ap_table_get(r->notes, SEI_MAGIC_HEIRLOOM) != NULL);
    if (! perdir) {
        ap_table_set(r->notes, SEI_MAGIC_HEIRLOOM, "post-read done");
        sconf  = (sei_cfg_rec *) ap_get_module_config(r->server->module_config,
                                                      &setenvif_module);
    }
    else {
        sconf = (sei_cfg_rec *) ap_get_module_config(r->per_dir_config,
                                                     &setenvif_module);
    }
    entries = (sei_entry *) sconf->conditionals->elts;
    last_name = NULL;
    val = NULL;
    for (i = 0; i < sconf->conditionals->nelts; ++i) {
        sei_entry *b = &entries[i];

        /* Optimize the case where a bunch of directives in a row use the
         * same header.  Remember we don't need to strcmp the two header
         * names because we made sure the pointers were equal during
         * configuration.
         */
        if (b->name != last_name) {
            last_name = b->name;
            switch (b->special_type) {
            case SPECIAL_REMOTE_ADDR:
                val = r->connection->remote_ip;
                break;
            case SPECIAL_REMOTE_HOST:
                val =  ap_get_remote_host(r->connection, r->per_dir_config,
                                          REMOTE_NAME);
                break;
            case SPECIAL_REMOTE_USER:
                val = r->connection->user;
                break;
            case SPECIAL_REQUEST_URI:
                val = r->uri;
                break;
            case SPECIAL_REQUEST_METHOD:
                val = r->method;
                break;
            case SPECIAL_REQUEST_PROTOCOL:
                val = r->protocol;
                break;
            case SPECIAL_NOT:
                val = ap_table_get(r->headers_in, b->name);
                if (val == NULL) {
                    val = ap_table_get(r->subprocess_env, b->name);
                }
                break;
            }
        }

        /*
         * A NULL value indicates that the header field or special entity
         * wasn't present or is undefined.  Represent that as an empty string
         * so that REs like "^$" will work and allow envariable setting
         * based on missing or empty field.
         */
        if (val == NULL) {
            val = "";
        }

        if (!ap_regexec(b->preg, val, 0, NULL, 0)) {
            array_header *arr = ap_table_elts(b->features);
            elts = (table_entry *) arr->elts;

            for (j = 0; j < arr->nelts; ++j) {
                if (!strcmp(elts[j].val, "!")) {
                    ap_table_unset(r->subprocess_env, elts[j].key);
                }
                else {
                    ap_table_setn(r->subprocess_env, elts[j].key, elts[j].val);
                }
            }
        }
    }

    return DECLINED;
}

Interestingly, this module hooks both post_read_request and header_parser to the same function, so it can set variables before and after the directory merge. (This is because other modules often use the environment variables to control their function.)

The function doesn't do anything particularly fascinating, except a rather dubious use of the notes table in the request record. It uses a note SEI_MAGIC_HEIRLOOM to tell it whether it's in the post_read_request or the header_parser (by virtue of post_read_request coming first); in our view it should simply have hooked two different functions and passed a flag instead. The rest of the function simply checks various fields in the request to, and conditionally sets environment variables for, subprocesses.

This function is virtually identical in both 1.3 and 2.0

Check Access  

int module_check_access(request_rec *pReq)
 

This routine checks access, in the allow/deny sense. It can return OK , DECLINED, or a status code. All modules are called until one of them returns something other than DECLINED or OK. If all modules return DECLINED, it is considered a configuration error. At this point, the URL and the filename (if relevant) are known, as are the client's address, user agent, and so forth. All of these are available through pReq. As long as everything says DECLINED or OK, the request can proceed.

The only example available in the standard modules is, unsurprisingly, from mod_access.c. See Example 21-17 for an excerpt from mod_access.c.

Example

Example 21-17. mod_access.c
static int find_allowdeny(request_rec *r, array_header *a, int method)
{
    allowdeny *ap = (allowdeny *) a->elts;
    int mmask = (1 << method);
    int i;
    int gothost = 0;
    const char *remotehost = NULL;

    for (i = 0; i < a->nelts; ++i) {
        if (!(mmask & ap[i].limited))
            continue;

        switch (ap[i].type) {
        case T_ENV:
            if (ap_table_get(r->subprocess_env, ap[i].x.from)) {
                return 1;
            }
            break;

        case T_ALL:
            return 1;

        case T_IP:
            if (ap[i].x.ip.net != INADDR_NONE
                && (r->connection->remote_addr.sin_addr.s_addr
                    & ap[i].x.ip.mask) == ap[i].x.ip.net) {
                return 1;
            }
            break;

        case T_HOST:
            if (!gothost) {
                remotehost = ap_get_remote_host(r->connection, r->per_dir_config,
                                                REMOTE_DOUBLE_REV);

                if ((remotehost == NULL) || is_ip(remotehost))
                    gothost = 1;
                else
                    gothost = 2;
            }

            if ((gothost == 2) && in_domain(ap[i].x.from, remotehost))
                return 1;
            break;

        case T_FAIL:
            /* do nothing? */
            break;
        }
    }

    return 0;
}

static int check_dir_access(request_rec *r)
{
    int method = r->method_number;
    access_dir_conf *a =
    (access_dir_conf *)
    ap_get_module_config(r->per_dir_config, &access_module);
    int ret = OK;

    if (a->order[method] == ALLOW_THEN_DENY) {
        ret = FORBIDDEN;
        if (find_allowdeny(r, a->allows, method))
            ret = OK;
        if (find_allowdeny(r, a->denys, method))
            ret = FORBIDDEN;
    }
    else if (a->order[method] == DENY_THEN_ALLOW) {
        if (find_allowdeny(r, a->denys, method))
            ret = FORBIDDEN;
        if (find_allowdeny(r, a->allows, method))
            ret = OK;
    }
    else {
        if (find_allowdeny(r, a->allows, method)
            && !find_allowdeny(r, a->denys, method))
            ret = OK;
        else
            ret = FORBIDDEN;
    }

    if (ret == FORBIDDEN
        && (ap_satisfies(r) != SATISFY_ANY || !ap_some_auth_required(r))) {
        ap_log_rerror(APLOG_MARK, APLOG_NOERRNO|APLOG_ERR, r,
                      "client denied by server configuration: %s",
                      r->filename);
    }

    return ret;
}

Pretty straightforward stuff. in_ip( ) and in_domain( ) check whether an IP address or domain name, respectively, match the IP or domain of the client.

The only difference in 2.0 is that the return value FORBIDDEN has become HTTP_FORBIDDEN.

Check User ID  

int module_check_user_id(request_rec *pReq)
 

This function is responsible for acquiring and checking a user ID. The user ID should be stored in pReq->connection->user. The function should return OK, DECLINED, or a status code. Of particular interest is HTTP_UNAUTHORIZED (formerly known as AUTH_REQUIRED), which should be returned if the authorization fails (either because the user agent presented no credentials or because those presented were not correct). All modules are polled until one returns something other than DECLINED. If all decline, a configuration error is logged, and an error is returned to the user agent. When HTTP_UNAUTHORIZED is returned, an appropriate header should be set to inform the user agent of the type of credentials to present when it retries. Currently, the appropriate header is WWW-Authenticate (see the HTTP 1.1 specification for details). Unfortunately, Apache's modularity is not quite as good as it might be in this area. So this hook usually provides alternate ways of accessing the user/password database, rather than changing the way authorization is actually done, as evidenced by the fact that the protocol side of authorization is currently dealt with in http_protocol.c, rather than in the module. Note that this function checks the validity of the username and password and not whether the particular user has permission to access the URL.

An obvious user of this hook is mod_auth.c, as shown in Example 21-18.

Example

Example 21-18. mod_auth.c
static int authenticate_basic_user(request_rec *r)
{
    auth_config_rec *sec =
    (auth_config_rec *) ap_get_module_config(r->per_dir_config, &auth_module);
    conn_rec *c = r->connection;
    const char *sent_pw;
    char *real_pw;
    char *invalid_pw;
    int res;

    if ((res = ap_get_basic_auth_pw(r, &sent_pw)))
        return res;

    if (!sec->auth_pwfile)
        return DECLINED;

    if (!(real_pw = get_pw(r, c->user, sec->auth_pwfile))) {
        if (!(sec->auth_authoritative))
            return DECLINED;
        ap_log_rerror(APLOG_MARK, APLOG_NOERRNO|APLOG_ERR, r,
                      "user %s not found: %s", c->user, r->uri);
        ap_note_basic_auth_failure(r);
        return AUTH_REQUIRED;
    }
    invalid_pw = ap_validate_password(sent_pw, real_pw);
    if (invalid_pw != NULL) {
        ap_log_rerror(APLOG_MARK, APLOG_NOERRNO|APLOG_ERR, r,
                      "user %s: authentication failure for \"%s\": %s",
                      c->user, r->uri, invalid_pw);
        ap_note_basic_auth_failure(r);
        return AUTH_REQUIRED;
    }
    return OK;
}

This function is essentially the same for 2.0, except that AUTH_REQUIRED has become HTTP_UNAUTHORIZED.

Check Auth  

int 
module_check_auth(request_rec *pReq)
 

This hook is called to check whether the authenticated user (found in pReq->connection->user) is permitted to access the current URL. It normally uses the per-directory configuration (remembering that this is actually the combined directory, location, and file configuration) to determine this. It must return OK, DECLINED, or a status code. Again, the usual status to return is HTTP_UNAUTHORIZED if access is denied, thus giving the user a chance to present new credentials. Modules are polled until one returns something other than DECLINED.

Again, the natural example to use is from mod_auth.c, as shown in Example 21-19.

Example

Example 21-19. mod_auth.c
int check_user_access (request_rec *r) {
    auth_config_rec *sec =
      (auth_config_rec *)ap_get_module_config (r->per_dir_config, &auth_module);
    char *user = r->connection->user;
    int m = r->method_number;
    int method_restricted = 0;
    register int x;
    char *t, *w;
    table *grpstatus;
    array_header *reqs_arr = requires (r);
    require_line *reqs;

    if (!reqs_arr)
        return (OK);
    reqs = (require_line *)reqs_arr->elts;

    if(sec->auth_grpfile)
        grpstatus = groups_for_user (r->pool, user, sec->auth_grpfile);
    else
        grpstatus = NULL;

    for(x=0; x < reqs_arr->nelts; x++) {

        if (! (reqs[x].method_mask & (1 << m))) continue;

        method_restricted = 1;

        t = reqs[x].requirement;
        w = getword(r->pool, &t, ' ');
        if(!strcmp(w,"valid-user"))
            return OK;
        if(!strcmp(w,"user")) {
            while(t[0]) {
                w = getword_conf (r->pool, &t);
                if(!strcmp(user,w))
                    return OK;
            }
        }
        else if(!strcmp(w,"group")) {
            if(!grpstatus) 
                return DECLINED;        /* DBM group?  Something else? */
            
            while(t[0]) {
                w = getword_conf(r->pool, &t);
                if(table_get (grpstatus, w))
                    return OK;
            }
        }
    }

    if (!method_restricted)
        return OK;

    note_basic_auth_failure (r);

    return AUTH_REQUIRED;
}

Again, this function is essentially the same in 2.0.

Type Checker  

int module_type_checker(request_rec *pReq)
 

At this stage, we have almost finished processing the request. All that is left to decide is who actually handles it. This is done in two stages: first, by converting the URL or filename into a MIME type or handler string, language, and encoding; and second, by calling the appropriate function for the type. This hook deals with the first part. If it generates a MIME type, it should be stored in pReq->content_type. Alternatively, if it generates a handler string, it should be stored in pReq->handler. The languages go in pReq->content_languages, and the encoding in pReq->content_encoding. Note that there is no defined way of generating a unique handler string. Furthermore, handler strings and MIME types are matched to the request handler through the same table, so the handler string should probably not be a MIME type.[7]

One obvious place that this must go on is in mod_mime.c. See Example 21-20.

Example

Example 21-20. mod_mime.c
int find_ct(request_rec *r)
{
    char *fn = strrchr(r->filename, '/'.;
    mime_dir_config *conf =
      (mime_dir_config *)ap_get_module_config(r->per_dir_config, &mime_module);
    char *ext, *type, *orighandler = r->handler;

    if (S_ISDIR(r->finfo.st_mode)) {
        r->content_type = DIR_MAGIC_TYPE;
        return OK;
    }

    if(fn == NULL) fn = r->filename;

    /* Parse filename extensions, which can be in any order */
    while ((ext = getword(r->pool, &fn, '.')) && *ext) {
      int found = 0;

      /* Check for Content-Type */
      if ((type = table_get (conf->forced_types, ext))
          || (type = table_get (hash_buckets[hash(*ext)], ext))) {
          r->content_type = type;
          found = 1;
      }

      /* Check for Content-Language */
      if ((type = table_get (conf->language_types, ext))) {
          r->content_language = type;
          found = 1;
      }

      /* Check for Content-Encoding */
      if ((type = table_get (conf->encoding_types, ext))) {
          if (!r->content_encoding)
              r->content_encoding = type;
          else
              r->content_encoding = pstrcat(r->pool, r->content_encoding,
                                            ", ", type, NULL);
          found = 1;
      }

      /* Check for a special handler, but not for proxy request */
      if ((type = table_get (conf->handlers, ext)) && !r->proxyreq) {
          r->handler = type;
          found = 1;
      }

      /* This is to deal with cases such as foo.gif.bak, which we want
       * to not have a type. So if we find an unknown extension, we
       * zap the type/language/encoding and reset the handler.
       */

      if (!found) {
        r->content_type = NULL;
        r->content_language = NULL;
        r->content_encoding = NULL;
        r->handler = orighandler;
      }
    }

    /* Check for overrides with ForceType/SetHandler */

    if (conf->type && strcmp(conf->type, "none"))
        r->content_type = pstrdup(r->pool, conf->type);
    if (conf->handler && strcmp(conf->handler, "none"))
        r->handler = pstrdup(r->pool, conf->handler);

    if (!r->content_type) return DECLINED;

    return OK;
}

Another example can be found in mod_negotiation.c, but it is rather more complicated than is needed to illustrate the point.

Although the 2.0 version of the example is rather different, the differences aren't really because of changes in the hook and are more concerned with the complication of determining MIME types with filters in place, so we won't bother to show the 2.0 version here.

Prerun Fixups  

int module_fixups(request_rec *pReq)
 

Nearly there! This is your last chance to do anything that might be needed before the request is finally handled. At this point, all processing that is going to be done before the request is handled has been completed, the request is going to be satisfied, and all that is left to do is anything the request handler won't do. Examples of what you might do here include setting environment variables for CGI scripts, adding headers to pReq->header_out, or even setting something to modify the behavior of another module's handler in pReq->notes. Things you probably shouldn't do at this stage are many, but, most importantly, you should leave anything security-related alone, including (but certainly not limited to) the URL, the filename, and the username. Most modules won't use this hook because they do their real work elsewhere.

As an example, we will set the environment variables for a shell script. Example 21-21 shows where it's done in mod_env.c.

Example

Example 21-21. mod_env.c
static int fixup_env_module(request_rec *r)
{
    table *e = r->subprocess_env;
    env_dir_config_rec *sconf = ap_get_module_config(r->per_dir_config,
                                                     &env_module);
    table *vars = sconf->vars;

    if (!sconf->vars_present)
        return DECLINED;

    r->subprocess_env = ap_overlay_tables(r->pool, e, vars);

    return OK;
}

Notice that this doesn't directly set the environment variables; that would be pointless because a subprocess's environment variables are created anew from pReq->subprocess_env. Also notice that, as is often the case in computing, considerably more effort is spent in processing the configuration for mod_env.c than is spent at the business end.

Handlers  

handler_rec aModuleHandlers[]; [1.3]
 

The definition of a handler_rec can be found in http_config.h (1.3):

typedef struct {
    char *content_type;
    int (*handler)(request_rec *);
} handler_rec;

In 2.0, the handlers are simply registered with a hook in the usual way and are responsible for checking the content type (or anything else they want to check) in the hook.

Finally, we are ready to handle the request. The core now searches through the modules' handler entries, looking for an exact match for either the handler type or the MIME type, in that order (that is, if a handler type is set, that is used; otherwise, the MIME type is used). When a match is found, the corresponding handler function is called. This will do the actual business of serving the user's request. Often you won't want to do this, because you'll have done the work of your module earlier, but this is the place to run your Java, translate to Swedish, or whatever you might want to do to serve actual content to the user. Most handlers either send some kind of content directly (in which case, they must remember to call ap_send_http_header( ) before sending the content) or use one of the internal redirect methods (e.g., internal_redirect( )).

mod_status.c only implements a handler; Example 21-22 (1.3) shows the handler's table.

Example

Example 21-22. mod_status.c
handler_rec status_handlers[] =
{
{ STATUS_MAGIC_TYPE, status_handler },
{ "server-status", status_handler },
{ NULL }
};

We don't show the actual handler here, because it's big and boring. All it does is trawl through the scoreboard (which records details of the various child processes) and generate a great deal of HTML. The user invokes this handler with either a SetHandler or an AddHandler; however, since the handler makes no use of a file, SetHandler is the more natural way to do it. Notice the reference to STATUS_MAGIC_TYPE. This is a "magic"; MIME type — the use of which is now deprecated — but we must retain it for backward compatibility in this particular module.

The same example in 2.0 has a hook instead of an array of handler_recs:

static void register_hooks(apr_pool_t *p)
{
    ap_hook_handler(status_handler, NULL, NULL, APR_HOOK_MIDDLE);
    ...
}

and, as discussed, status_handler( ) checks the content type itself:

static int status_handler(request_rec *r)
{
...
    if (strcmp(r->handler, STATUS_MAGIC_TYPE) && 
        strcmp(r->handler, "server-status")) {
        return DECLINED;
    }
...
Logger  

int module_logger(request_rec *pRec)
 

Now that the request has been processed and the dust has settled, you may want to log the request in some way. Here's your chance to do that. Although the core stops running the logger function as soon as a module returns something other than OK or DECLINED, that is rarely done, as there is no way to know whether another module needs to log something.

Although mod_log_agent.c is more or less out of date since mod_log_config.c was introduced, it makes a nice, compact example. See Example 21-23.

Example

Example 21-23. mod_log_agent.c
int agent_log_transaction(request_rec *orig)
{
    agent_log_state *cls = ap_get_module_config (orig->server->module_config,
                                              &agent_log_module);
    char str[HUGE_STRING_LEN];
    char *agent;
    request_rec *r;

    if(cls->agent_fd <0)
      return OK;

    for (r = orig; r->next; r = r->next)
        continue;
    if (*cls->fname == '\0'.    /* Don't log agent */
        return DECLINED;

    agent = table_get(orig->headers_in, "User-Agent");
    if(agent != NULL) 
      {
        sprintf(str, "%s\n", agent);
        write(cls->agent_fd, str, strlen(str));
      }

    return OK;
}

This is not a good example of programming practice. With its fixed-size buffer str, it leaves a gaping security hole. It wouldn't be enough simply to split the write into two parts to avoid this problem. Because the log file is shared among all server processes, the write must be atomic, or the log file could get mangled by overlapping writes. mod_log_config.c carefully avoids this problem.

Unfortunately, mod_log_agent.c has been axed in 2.0; but if it were still there, it would look pretty much the same.

Child Exit  

void 
child_exit(server_rec *pServer,pool *pPool) [1.3]
 

This function is called immediately before a particular child exits. See Child Initialization; earlier in this chapter, for an explanation of what "child"; means in this context. Typically, this function will be used to release resources that are persistent between connections, such as database or file handles.

In 2.0 there is no child_exit hook — instead one registers a cleanup function with the pool passed in the init_child hook.

See Example 21-24 for an excerpt from mod_log_config.c.

Example

Example 21-24. mod_log_config.c
static void flush_all_logs(server_rec *s, pool *p)
{
    multi_log_state *mls;
    array_header *log_list;
    config_log_state *clsarray;
    int i;

    for (; s; s = s->next) {
        mls = ap_get_module_config(s->module_config, &config_log_module);
        log_list = NULL;
        if (mls->config_logs->nelts) {
            log_list = mls->config_logs;
        }
        else if (mls->server_config_logs) {
            log_list = mls->server_config_logs;
        }
        if (log_list) {
            clsarray = (config_log_state *) log_list->elts;
            for (i = 0; i < log_list->nelts; ++i) {
                flush_log(&clsarray[i]);
            }
        }
    }
}

This routine is only used when BUFFERED_LOGS is defined. Predictably enough, it flushes all the buffered logs, which would otherwise be lost when the child exited.

In 2.0, the same function is used, but it is registered via the init_child hook:

static void init_child(apr_pool_t *p, server_rec *s)
{
#ifdef BUFFERED_LOGS
    /* Now register the last buffer flush with the cleanup engine */
    apr_pool_cleanup_register(p, s, flush_all_logs, flush_all_logs);
#endif
}

21.4 A Complete Example

We spent some time trying to think of an example of a module that uses all the available hooks. At the same time, we spent considerable effort tracking through the innards of Apache to find out what happened when. Then we suddenly thought of writing a module to show what happened when. And, presto, mod_reveal.c was born. This is not a module you'd want to include in a live Apache without modification, since it prints stuff to the standard error output (which ends up in the error log, for the most part). But rather than obscure the main functionality by including code to switch the monitoring on and off, we thought it best to keep it simple. Besides, even in this form the module is very useful; it's presented and explained in this section.

21.4.1 Overview

The module implements two commands, RevealServerTag and RevealTag. RevealServerTag names a server section and is stored in the per-server configuration. RevealTag names a directory (or location or file) section and is stored in the per-directory configuration. When per-server or per-directory configurations are merged, the resulting configuration is tagged with a combination of the tags of the two merged sections. The module also implements a handler, which generates HTML with interesting information about a URL.

No self-respecting module starts without a copyright notice:

/*
Reveal the order in which things are done.

Copyright (C) 1996, 1998 Ben Laurie
*/

Note that the included http_protocol.h is only needed for the request handle; the other two are required by almost all modules:

#include "httpd.h"
#include "http_config.h"
#include "http_protocol.h"
#include "http_request.h" [2.0]
#include "apr_strings.h" [2.0]
#include "http_connection.h" [2.0]
#include "http_log.h" [2.0]
#include "http_core.h" [2.0]
#include "scoreboard.h" [2.0]
#include <unistd.h> [2.0]

The per-directory configuration structure is:

typedef struct
    {
    char *szDir;
    char *szTag;
    } SPerDir;

And the per-server configuration structure is:

typedef struct
    {
    char *szServer;
    char *szTag;
    } SPerServer;

There is an unavoidable circular reference in most modules; the module structure is needed to access the per-server and per-directory configurations in the hook functions. But in order to construct the module structure, we need to know the hook functions. Since there is only one module structure and a lot of hook functions, it is simplest to forward reference the module structure:

extern module reveal_module;

If a string is NULL, it may crash printf( ) on some systems, so we define a function to give us a stand-in for NULL strings:

static const char *None(const char *szStr)
    {
    if(szStr)
    return szStr;
    return "(none)";
    }

Since the server names and port numbers are often not known when the per-server structures are created, but are filled in by the time the initialization function is called, we rename them in the init function. Note that we have to iterate over all the servers, since init is only called with the "main"; server structure. As we go, we print the old and new names so we can see what is going on. Just for completeness, we add a module version string to the server version string. Note that you would not normally do this for such a minor module:

static void SubRevealInit(server_rec *pServer,pool *pPool)
    {
    SPerServer *pPerServer=ap_get_module_config(pServer->module_config,
                                                &reveal_module);

    if(pServer->server_hostname &&
       (!strncmp(pPerServer->szServer,"(none):",7)
        || !strcmp(pPerServer->szServer+strlen(pPerServer->szServer)
                   -2,":0")))
    {
        char szPort[20];

        fprintf(stderr,"Init        : update server name from %s\n",
                pPerServer->szServer);
        sprintf(szPort,"%d",pServer->port);
        pPerServer->szServer=ap_pstrcat(pPool,pServer->server_hostname,":",
                                        szPort,NULL);
    }
    fprintf(stderr,"Init        : host=%s port=%d server=%s tag=%s\n",
            pServer->server_hostname,pServer->port,pPerServer->szServer,
            None(pPerServer->szTag));
    }

static void RevealInit(server_rec *pServer,pool *pPool)
    {
    ap_add_version_component("Reveal/0.0");
    for( ; pServer ; pServer=pServer->next)
        SubRevealInit(pServer,pPool);
    fprintf(stderr,"Init        : done\n");
    }

Here we create the per-server configuration structure. Since this is called as soon as the server is created, pServer->server_hostname and pServer->port may not have been initialized, so their values must be taken with a pinch of salt (but they get corrected later):

static void *RevealCreateServer(pool *pPool,server_rec *pServer)
    {
    SPerServer *pPerServer=ap_palloc(pPool,sizeof *pPerServer);
    const char *szServer;
    char szPort[20];

    szServer=None(pServer->server_hostname);
    sprintf(szPort,"%d",pServer->port);

    pPerServer->szTag=NULL;
    pPerServer->szServer=ap_pstrcat(pPool,szServer,":",szPort,NULL);

    fprintf(stderr,"CreateServer: server=%s:%s\n",szServer,szPort);
    return pPerServer;
    }

Here we merge two per-server configurations. The merged configuration is tagged with the names of the two configurations from which it is derived (or the string (none) if they weren't tagged). Note that we create a new per-server configuration structure to hold the merged information (this is the standard thing to do):

static void *RevealMergeServer(pool *pPool,void *_pBase,void *_pNew)
    {
    SPerServer *pBase=_pBase;
    SPerServer *pNew=_pNew;
    SPerServer *pMerged=ap_palloc(pPool,sizeof *pMerged);

    fprintf(stderr,
          "MergeServer : pBase: server=%s tag=%s pNew: server=%s tag=%s\n",
          pBase->szServer,None(pBase->szTag),
          pNew->szServer,None(pNew->szTag));

    pMerged->szServer=ap_pstrcat(pPool,pBase->szServer,"+",pNew->szServer,
                                 NULL);
    pMerged->szTag=ap_pstrcat(pPool,None(pBase->szTag),"+",
                              None(pNew->szTag),NULL);

    return pMerged;
    }

Now we create a per-directory configuration structure. If szDir is NULL, we change it to (none) to ensure that later merges have something to merge! Of course, szDir is NULL once for each server. Notice that we don't log which server this was created for; that's because there is no legitimate way to find out. It is also worth mentioning that this will only be called for a particular directory (or location or file) if a RevealTag directive occurs in that section:

static void *RevealCreateDir(pool *pPool,char *_szDir)
    {
    SPerDir *pPerDir=ap_palloc(pPool,sizeof *pPerDir);
    const char *szDir=None(_szDir);

    fprintf(stderr,"CreateDir   : dir=%s\n",szDir);

    pPerDir->szDir=ap_pstrdup(pPool,szDir);
    pPerDir->szTag=NULL;

    return pPerDir;
    }

Next we merge the per-directory structures. Again, we have no clue which server we are dealing with. In practice, you'll find this function is called a great deal:

static void *RevealMergeDir(pool *pPool,void *_pBase,void *_pNew)
    {
    SPerDir *pBase=_pBase;
    SPerDir *pNew=_pNew;
    SPerDir *pMerged=ap_palloc(pPool,sizeof *pMerged);

    fprintf(stderr,"MergeDir    : pBase: dir=%s tag=%s "
            "pNew: dir=%s tag=%s\n",pBase->szDir,None(pBase->szTag),
            pNew->szDir,None(pNew->szTag));
    pMerged->szDir=ap_pstrcat(pPool,pBase->szDir,"+",pNew->szDir,NULL);
    pMerged->szTag=ap_pstrcat(pPool,None(pBase->szTag),"+",
                              None(pNew->szTag),NULL);

    return pMerged;
    }

Here is a helper function used by most of the other hooks to show the per-server and per-directory configurations currently in use. Although it caters to the situation in which there is no per-directory configuration, that should never happen:[8]

static void ShowRequestStuff(request_rec *pReq)
    {
    SPerDir *pPerDir=ap_get_module_config(pReq->per_dir_config,
               &reveal_module); [1.3]
    SPerDir *pPerDir=pReq->per_dir_config ?
      ap_get_module_config(pReq->per_dir_config,&reveal_module) : NULL; [2.0]
    SPerServer *pPerServer=ap_get_module_config(pReq->server->
               module_config,&reveal_module);
    SPerDir none={"(null)","(null)"};
    SPerDir noconf={"(no per-dir config)","(no per-dir config)"};

    if(!pReq->per_dir_config)
        pPerDir=&noconf;
    else if(!pPerDir)
        pPerDir=&none;

    fprintf(stderr," server=%s tag=%s dir=%s tag=%s\n",
            pPerServer->szServer,pPerServer->szTag,pPerDir->szDir,
               pPerDir->szTag);
    }

None of the following hooks does anything more than trace itself:

static int RevealTranslate(request_rec *pReq)
    {
    fprintf(stderr,"Translate   : uri=%s",pReq->uri);
    ShowRequestStuff(pReq);
    return DECLINED;
    }

static int RevealCheckUserID(request_rec *pReq)
    {
    fprintf(stderr,"CheckUserID :");
    ShowRequestStuff(pReq);
    return DECLINED;
    }

static int RevealCheckAuth(request_rec *pReq)
    {
    fprintf(stderr,"CheckAuth   :");
    ShowRequestStuff(pReq);
    return DECLINED;
    }

static int RevealCheckAccess(request_rec *pReq)
    {
    fprintf(stderr,"CheckAccess :");
    ShowRequestStuff(pReq);
    return DECLINED;
    }

static int RevealTypeChecker(request_rec *pReq)
    {
    fprintf(stderr,"TypeChecker :");
    ShowRequestStuff(pReq);
    return DECLINED;
    }

static int RevealFixups(request_rec *pReq)
    {
    fprintf(stderr,"Fixups      :");
    ShowRequestStuff(pReq);
    return DECLINED;
    }

static int RevealLogger(request_rec *pReq)
    {
    fprintf(stderr,"Logger      :");
    ShowRequestStuff(pReq);
    return DECLINED;
    }

static int RevealHeaderParser(request_rec *pReq)
    {
    fprintf(stderr,"HeaderParser:");
    ShowRequestStuff(pReq);

    return DECLINED;
    }

Next comes the child-initialization function. This extends the server tag to include the PID of the particular server instance in which it exists. Note that, like the init function, it must iterate through all the server instances — also, in 2.0, it must register the child exit handler:

static void RevealChildInit(server_rec *pServer, pool *pPool)
    {
    char szPID[20];

    fprintf(stderr,"Child Init  : pid=%d\n",(int)getpid( ));

    sprintf(szPID,"[%d]",(int)getpid( ));
    for( ; pServer ; pServer=pServer->next)
        {
        SPerServer *pPerServer=ap_get_module_config(pServer->module_config,
                                                    &reveal_module);
        pPerServer->szServer=ap_pstrcat(pPool,pPerServer->szServer,szPID,
                                        NULL);
        }
    apr_pool_cleanup_register(pPool,pServer,RevealChildExit,RevealChildExit);[2.0]
    }

Then the last two hooks are simply logged — however, note that RevealChildExit( ) is completely differently as declared for 1.3 and 2.0. Also, in 2.0 RevealChildExit( ) has to come before RevealChildInit( ) to avoid compiler errors:

(1.3)
static void RevealChildExit(server_rec *pServer, pool *pPool)
    {
    fprintf(stderr,"Child Exit  : pid=%d\n",(int)getpid( ));
    }
(2.0)
static apr_status_t RevealChildExit(void *p)
    {
    fprintf(stderr,"Child Exit  : pid=%d\n",(int)getpid( ));

    return OK;
    }

static int RevealPostReadRequest(request_rec *pReq)
    {
    fprintf(stderr,"PostReadReq : method=%s uri=%s protocol=%s",
            pReq->method,pReq->unparsed_uri,pReq->protocol);
    ShowRequestStuff(pReq);

    return DECLINED;
    }

The following is the handler for the RevealTag directive. If more than one RevealTag appears in a section, they are glued together with a "-"; separating them. A NULL is returned to indicate that there was no error:

static const char *RevealTag(cmd_parms *cmd, SPerDir *pPerDir, char *arg)
    {
    SPerServer *pPerServer=ap_get_module_config(cmd->server->module_config,
                                                &reveal_module);

    fprintf(stderr,"Tag         : new=%s dir=%s server=%s tag=%s\n",
            arg,pPerDir->szDir,pPerServer->szServer,
            None(pPerServer->szTag));

    if(pPerDir->szTag)
        pPerDir->szTag=ap_pstrcat(cmd->pool,pPerDir->szTag,"-",arg,NULL);
    else
        pPerDir->szTag=ap_pstrdup(cmd->pool,arg);

    return NULL;
    }

This code handles the RevealServerTag directive. Again, if more than one Reveal-ServerTag appears in a server section, they are glued together with "-"; in between:

static const char *RevealServerTag(cmd_parms *cmd, SPerDir *pPerDir,
                                   char *arg)
    {
    SPerServer *pPerServer=ap_get_module_config(cmd->server->module_config,
                                                &reveal_module);

    fprintf(stderr,"ServerTag   : new=%s server=%s stag=%s\n",arg,
            pPerServer->szServer,None(pPerServer->szTag));

    if(pPerServer->szTag)
        pPerServer->szTag=ap_pstrcat(cmd->pool,pPerServer->szTag,"-",arg,
                                     NULL);
    else
        pPerServer->szTag=ap_pstrdup(cmd->pool,arg);

    return NULL;
    }

Here we bind the directives to their handlers. Note that RevealTag uses ACCESS_CONF|OR_ALL as its req_override so that it is legal wherever a <Directory> section occurs. RevealServerTag only makes sense outside <Directory> sections, so it uses RSRC_CONF:

(1.3)static command_rec aCommands[]=
    {
{ "RevealTag", RevealTag, NULL, ACCESS_CONF|OR_ALL, TAKE1, "a tag for this
    section"},
{ "RevealServerTag", RevealServerTag, NULL, RSRC_CONF, TAKE1, "a tag for this
    server" },
{ NULL }
    };
(2.0)static command_rec aCommands[]=
    {
    AP_INIT_TAKE1("RevealTag", RevealTag, NULL, ACCESS_CONF|OR_ALL,
                  "a tag for this section"),
    AP_INIT_TAKE1("RevealServerTag", RevealServerTag, NULL, RSRC_CONF,
                  "a tag for this server" ),
    { NULL }
    };

These two helper functions simply output things as a row in a table:

static void TShow(request_rec *pReq,const char *szHead,const char *szItem)
    {
    ap_rprintf(pReq,"<TR><TH>%s<TD>%s\n",szHead,szItem);
    }

static void TShowN(request_rec *pReq,const char *szHead,int nItem)
    {
    ap_rprintf(pReq,"<TR><TH>%s<TD>%d\n",szHead,nItem);
    }

The following code is the request handler; it generates HTML describing the configurations that handle the URI:

static int RevealHandler(request_rec *pReq)
    {
    SPerDir *pPerDir=ap_get_module_config(pReq->per_dir_config,
               &reveal_module);
    SPerServer *pPerServer=ap_get_module_config(pReq->server->
               module_config,&reveal_module);

    pReq->content_type="text/html";
    ap_send_http_header(pReq);

    ap_rputs("<CENTER><H1>Revelation of ",pReq);
    ap_rputs(pReq->uri,pReq);
    ap_rputs("</H1></CENTER><HR>\n",pReq);
    ap_rputs("<TABLE>\n",pReq);
    TShow(pReq,"URI",pReq->uri);
    TShow(pReq,"Filename",pReq->filename);
    TShow(pReq,"Server name",pReq->server->server_hostname);
    TShowN(pReq,"Server port",pReq->server->port);
    TShow(pReq,"Server config",pPerServer->szServer);
    TShow(pReq,"Server config tag",pPerServer->szTag);
    TShow(pReq,"Directory config",pPerDir->szDir);
    TShow(pReq,"Directory config tag",pPerDir->szTag);
    ap_rputs("</TABLE>\n",pReq);

    return OK;
    }

Here we associate the request handler with the handler string (1.3):

static handler_rec aHandlers[]=
    {
{ "reveal", RevealHandler },
{ NULL },
    };

And finally, in 1.3, there is the module structure:

module reveal_module = {
   STANDARD_MODULE_STUFF,
   RevealInit,                  /* initializer */
   RevealCreateDir,             /* dir config creater */
   RevealMergeDir,              /* dir merger --- default is to override */
   RevealCreateServer,          /* server config */
   RevealMergeServer,           /* merge server configs */
   aCommands,                   /* command table */
   aHandlers,                   /* handlers */
   RevealTranslate,             /* filename translation */
   RevealCheckUserID,           /* check_user_id */
   RevealCheckAuth,             /* check auth */
   RevealCheckAccess,           /* check access */
   RevealTypeChecker,           /* type_checker */
   RevealFixups,                /* fixups */
   RevealLogger,                /* logger */
   RevealHeaderParser,          /* header parser */
   RevealChildInit,             /* child init */
   RevealChildExit,             /* child exit */
   RevealPostReadRequest,       /* post read request */
};

In 2.0, we have the hook-registering function and the module structure:

static void RegisterHooks(apr_pool_t *pPool)
    {
    ap_hook_post_config(RevealInit,NULL,NULL,APR_HOOK_MIDDLE);
    ap_hook_handler(RevealHandler,NULL,NULL,APR_HOOK_MIDDLE);
    ap_hook_translate_name(RevealTranslate,NULL,NULL,APR_HOOK_MIDDLE);
    ap_hook_check_user_id(RevealCheckUserID,NULL,NULL,APR_HOOK_MIDDLE);
    ap_hook_auth_checker(RevealCheckAuth,NULL,NULL,APR_HOOK_MIDDLE);
    ap_hook_access_checker(RevealCheckAccess,NULL,NULL,APR_HOOK_MIDDLE);
    ap_hook_type_checker(RevealTypeChecker,NULL,NULL,APR_HOOK_MIDDLE);
    ap_hook_fixups(RevealFixups,NULL,NULL,APR_HOOK_MIDDLE);
    ap_hook_log_transaction(RevealLogger,NULL,NULL,APR_HOOK_MIDDLE);
    ap_hook_header_parser(RevealHeaderParser,NULL,NULL,APR_HOOK_MIDDLE);
    ap_hook_child_init(RevealChildInit,NULL,NULL,APR_HOOK_MIDDLE);
    ap_hook_post_read_request(RevealPostReadRequest,NULL,NULL,APR_HOOK_MIDDLE);
    }

 module reveal_module = {
   STANDARD20_MODULE_STUFF,
   RevealCreateDir,             /* dir config creater */
   RevealMergeDir,              /* dir merger --- default is to override */
   RevealCreateServer,          /* server config */
   RevealMergeServer,           /* merge server configs */
   aCommands,                   /* command table */
   RegisterHooks		/* hook registration */
};

The module can be included in Apache by specifying:

AddModule modules/extra/mod_reveal.o

in Configuration. You might like to try it on your favorite server: just pepper the httpd.conf file with RevealTag and RevealServerTag directives. Because of the huge amount of logging this produces, it would be unwise to use it on a live server!

21.4.2 Example Output

To illustrate mod_reveal.c in use, we used the following configuration:

Listen 9001
Listen 9000

TransferLog /home/ben/www/APACHE3/book/logs/access_log
ErrorLog /home/ben/www/APACHE3/book/logs/error_log
RevealTag MainDir
RevealServerTag MainServer
<LocationMatch /.reveal>
RevealTag Revealer
SetHandler reveal
</LocationMatch>

<VirtualHost *:9001>
DocumentRoot /home/ben/www/APACHE3/docs
RevealTag H1Main
RevealServerTag H1
<Directory /home/ben/www/APACHE3/docs/protected>
 RevealTag H1ProtectedDirectory
</Directory>
<Location /protected>
 RevealTag H1ProtectedLocation
</Location>
</VirtualHost>

<VirtualHost *:9000>
DocumentRoot /home/camilla/www/APACHE3/docs
RevealTag H2Main
RevealServerTag H2
</VirtualHost>

Note that the <Directory> and <Location> sections in the first virtual host actually refer to the same place. This is to illustrate the order in which the sections are combined. Also note that the <LocationMatch> section doesn't have to correspond to a real file; looking at any location that ends with .reveal will invoke mod_reveal.c 's handler. Starting the server produces this on the screen:

bash$ httpd -d ~/www/APACHE3/book/
CreateServer: server=(none):0
CreateDir   : dir=(none)
PreConfig [2.0]
Tag         : new=MainDir dir=(none) server=(none):0 tag=(none)
ServerTag   : new=MainServer server=(none):0 stag=(none)
CreateDir   : dir=/.reveal
Tag         : new=Revealer dir=/.reveal server=(none):0 tag=MainServer
CreateDir   : dir=(none)
CreateServer: server=(none):9001
Tag         : new=H1Main dir=(none) server=(none):9001 tag=(none)
ServerTag   : new=H1 server=(none):9001 stag=(none)
CreateDir   : dir=/home/ben/www/APACHE3/docs/protected
Tag         : new=H1ProtectedDirectory dir=/home/ben/www/APACHE3/docs/protected
              server=(none):9001 tag=H1
CreateDir   : dir=/protected
Tag         : new=H1ProtectedLocation dir=/protected server=(none):9001
              tag=H1
CreateDir   : dir=(none)
CreateServer: server=(none):9000
Tag         : new=H2Main dir=(none) server=(none):9000 tag=(none)
ServerTag   : new=H2 server=(none):9000 stag=(none)
MergeServer : pBase: server=(none):0 tag=MainServer pNew: server=(none):9000
              tag=H2
MergeDir    : pBase: dir=(none) tag=MainDir pNew: dir=(none) tag=H2Main
MergeServer : pBase: server=(none):0 tag=MainServer pNew: server=(none):9001
              tag=H1
MergeDir    : pBase: dir=(none) tag=MainDir pNew: dir=(none) tag=H1Main

Notice that in 2.0, the pre_config hook actually comes slightly after configuration has started!

Notice that the <Location> and <LocationMatch> sections are treated as directories as far as the code is concerned. At this point, stderr is switched to the error log, and the following is logged:

OpenLogs         : server=(none):0 tag=MainServer [2.0]
Init             : update server name from (none):0
Init             : host=scuzzy.ben.algroup.co.uk port=0 server=scuzzy.ben.algroup.co.
uk:0 tag=MainServer
Init             : update server name from (none):0+(none):9000
Init             : host=scuzzy.ben.algroup.co.uk port=9000 server=scuzzy.ben.algroup.
co.uk:9000 tag=MainServer+H2
Init             : update server name from (none):0+(none):9001
Init             : host=scuzzy.ben.algroup.co.uk port=9001 server=scuzzy.ben.algroup.
co.uk:9001 tag=MainServer+H1
Init             : done

At this point, the first-pass initialization is complete, and Apache destroys the configurations and starts again (this double initialization is required because directives may change things such as the location of the initialization files):[9]

CreateServer: server=(none):0
CreateDir   : dir=(none)
Tag         : new=MainDir dir=(none) server=(none):0 tag=(none)
ServerTag   : new=MainServer server=(none):0 stag=(none)
CreateDir   : dir=/.reveal
Tag         : new=Revealer dir=/.reveal server=(none):0 tag=MainServer
CreateDir   : dir=(none)
CreateServer: server=(none):9001
Tag         : new=H1Main dir=(none) server=(none):9001 tag=(none)
ServerTag   : new=H1 server=(none):9001 stag=(none)
CreateDir   : dir=/home/ben/www/APACHE3/docs/protected
Tag         : new=H1ProtectedDirectory dir=/home/ben/www/APACHE3/docs/protected 
server=(none):9001 tag=H1
CreateDir   : dir=/protected
Tag         : new=H1ProtectedLocation dir=/protected server=(none):9001
              tag=H1
CreateDir   : dir=(none)
CreateServer: server=(none):9000
Tag         : new=H2Main dir=(none) server=(none):9000 tag=(none)
ServerTag   : new=H2 server=(none):9000 stag=(none)

Now we've created all the server and directory sections, and the top-level server is merged with the virtual hosts:

MergeServer : pBase: server=(none):0 tag=MainServer pNew: server=(none):9000
              tag=H2
MergeDir    : pBase: dir=(none) tag=MainDir pNew: dir=(none) tag=H2Main
MergeServer : pBase: server=(none):0 tag=MainServer pNew: server=(none):9001
              tag=H1
MergeDir    : pBase: dir=(none) tag=MainDir pNew: dir=(none) tag=H1Main

Now the init functions are called (which rename the servers now that their "real" names are known):

Init        : update server name from (none):0
Init        : host=freeby.ben.algroup.co.uk port=0
              server=freeby.ben.algroup.co.uk:0 tag=MainServer
Init        : update server name from (none):0+(none):9000
Init        : host=freeby.ben.algroup.co.uk port=9000
              server=freeby.ben.algroup.co.uk:9000 tag=MainServer+H2
Init        : update server name from (none):0+(none):9001
Init        : host=freeby.ben.algroup.co.uk port=9001
              server=freeby.ben.algroup.co.uk:9001 tag=MainServer+H1
Init        : done

Apache logs its startup message:

[Sun Jul 12 13:08:01 1998] [notice] Apache/1.3.1-dev (Unix) Reveal/0.0 configured — 
resuming normal operations

Child inits are called:

Child Init  : pid=23287
Child Init  : pid=23288
Child Init  : pid=23289
Child Init  : pid=23290
Child Init  : pid=23291

And Apache is ready to start handling requests. First, we request http://host:9001/:

CreateConnection : server=scuzzy.ben.algroup.co.uk:0[78348] tag=MainServer conn_id=0 
[2.0]
PreConnection    : keepalive=0 double_reverse=0 [2.0]
ProcessConnection: keepalive=0 double_reverse=0 [2.0]
CreateRequest    : server=scuzzy.ben.algroup.co.uk:9001[78348] tag=MainServer+H1 
dir=(no per-dir config) tag=(no per-dir config) [2.0]
PostReadReq : method=GET uri=/ protocol=HTTP/1.0
              server=freeby.ben.algroup.co.uk:9001[23287] tag=MainServer+H1
              dir=(none)+(none) tag=MainDir+H1Main
QuickHandler     : lookup_uri=0 server=scuzzy.ben.algroup.co.uk:9001[78348] 
tag=MainServer+H1 dir=(none)+(none) tag=MainDir+H1Main [2.0]
Translate   : uri=/ server=freeby.ben.algroup.co.uk:9001[23287]
              tag=MainServer+H1 dir=(none)+(none) tag=MainDir+H1Main
MapToStorage     : server=scuzzy.ben.algroup.co.uk:9001[78348] tag=MainServer+H1 
dir=(none)+(none) tag=MainDir+H1Main [2.0]
HeaderParser: server=freeby.ben.algroup.co.uk:9001[23287] tag=MainServer+H1
              dir=(none)+(none) tag=MainDir+H1Main
CheckAccess : server=freeby.ben.algroup.co.uk:9001[23287] tag=MainServer+H1
              dir=(none)+(none) tag=MainDir+H1Main
TypeChecker : server=freeby.ben.algroup.co.uk:9001[23287] tag=MainServer+H1
              dir=(none)+(none) tag=MainDir+H1Main [1.3]
Fixups      : server=freeby.ben.algroup.co.uk:9001[23287] tag=MainServer+H1
              dir=(none)+(none) tag=MainDir+H1Main

Because / is a directory, Apache attempts to use /index.html instead (in this case, it didn't exist, but Apache still goes through the motions):

CreateRequest    : server=scuzzy.ben.algroup.co.uk:9001[78348] tag=MainServer+H1 
dir=(none)+(none) tag=MainDir+H1Main [2.0]
QuickHandler     : lookup_uri=1 server=scuzzy.ben.algroup.co.uk:9001[78348] 
tag=MainServer+H1 dir=(none)+(none) tag=MainDir+H1Main [2.0]
Translate   : uri=/index.html server=freeby.ben.algroup.co.uk:9001[23287]
              tag=MainServer+H1 dir=(none)+(none) tag=MainDir+H1Main

At this point, 1.3 and 2.0 diverge fairly radically. In 1.3:

CheckAccess : server=freeby.ben.algroup.co.uk:9001[23287] tag=MainServer+H1
              dir=(none)+(none) tag=MainDir+H1Main
TypeChecker : server=freeby.ben.algroup.co.uk:9001[23287] tag=MainServer+H1
              dir=(none)+(none) tag=MainDir+H1Main
Fixups      : server=freeby.ben.algroup.co.uk:9001[23287] tag=MainServer+H1
              dir=(none)+(none) tag=MainDir+H1Main
Logger      : server=freeby.ben.algroup.co.uk:9001[23287] tag=MainServer+H1
              dir=(none)+(none) tag=MainDir+H1Main
Child Init  : pid=23351

Pretty straightforward, but note that the configurations used are the merge of the main server's and the first virtual host's. Also notice the Child init at the end: this is because Apache decided the load warranted starting another child to handle it.

But 2.0 is rather more complex:

MapToStorage     : server=scuzzy.ben.algroup.co.uk:9001[79410] tag=MainServer+H1 
dir=(none)+(none) tag=MainDir+H1Main unparsed_uri=/index.html
Fixups           : server=scuzzy.ben.algroup.co.uk:9001[79410] tag=MainServer+H1 
dir=(none)+(none) tag=MainDir+H1Main unparsed_uri=/index.html
InsertFilter     : server=scuzzy.ben.algroup.co.uk:9001[79410]  tag=MainServer+H1 
dir=(none)+(none) tag=MainDir+H1Main unparsed_uri=/

Up to this point, we're checking for /index.html and then continuing with /. From here, we get lots of extra stuff caused by mod_autoindex using internal requests to construct the URLs for the index page:

CreateRequest    : server=scuzzy.ben.algroup.co.uk:9001[79410] tag=MainServer+H1 
dir=(none)+(none) tag=MainDir+H1Main unparsed_uri=(null)
MapToStorage     : server=scuzzy.ben.algroup.co.uk:9001[79410] tag=MainServer+H1 
dir=(none)+(none) tag=MainDir+H1Main unparsed_uri=/protected/
MergeDir         : pBase: dir=(none)+(none) tag=MainDir+H1Main pNew: dir=/home/ben/
www5/docs/protected/ tag=H1ProtectedDirectory
CheckAccess      : server=scuzzy.ben.algroup.co.uk:9001[79410] tag=MainServer+H1 
dir=(none)+(none)+/home/ben/www5/docs/protected/ tag=MainDir+H1Main+H1Protected
Directory unparsed_uri=/protected/
Fixups           : server=scuzzy.ben.algroup.co.uk:9001[79410] tag=MainServer+H1 
dir=(none)+(none)+/home/ben/www5/docs/protected/ tag=MainDir+H1Main+H1Protected
Directory unparsed_uri=/protected/
CreateRequest    : server=scuzzy.ben.algroup.co.uk:9001[79410] tag=MainServer+H1 
dir=(none)+(none) tag=MainDir+H1Main unparsed_uri=(null)
QuickHandler     : lookup_uri=1 server=scuzzy.ben.algroup.co.uk:9001[79410] 
tag=MainServer+H1 dir=(none)+(none) tag=MainDir+H1Main unparsed_uri=/protected/index.
html
MergeDir         : pBase: dir=(none)+(none) tag=MainDir+H1Main pNew: dir=/protected 
tag=H1ProtectedLocation
Translate        : uri=/protected/index.html server=scuzzy.ben.algroup.co.uk:9001[79410] 
tag=MainServer+H1 dir=(none)+(none)+/protected tag=MainDir+H1Main+H1ProtectedLocation 
unparsed_uri=/protected/index.html
MapToStorage     : server=scuzzy.ben.algroup.co.uk:9001[79410] tag=MainServer+H1 
dir=(none)+(none) tag=MainDir+H1Main unparsed_uri=/protected/index.html
MergeDir         : pBase: dir=(none)+(none) tag=MainDir+H1Main pNew: dir=/home/ben/
www5/docs/protected/ tag=H1ProtectedDirectory
MergeDir         : pBase: dir=(none)+(none)+/home/ben/www5/docs/protected/ 
tag=MainDir+H1Main+H1ProtectedDirectory pNew: dir=/protected tag=H1ProtectedLocation
CheckAccess      : server=scuzzy.ben.algroup.co.uk:9001[79410] tag=MainServer+H1 
dir=(none)+(none)+/home/ben/www5/docs/protected/+/protected 
tag=MainDir+H1Main+H1ProtectedDirectory+H1ProtectedLocation unparsed_uri=/protected/
index.html
Fixups           : server=scuzzy.ben.algroup.co.uk:9001[79410] tag=MainServer+H1 
dir=(none)+(none)+/home/ben/www5/docs/protected/+/protected 
tag=MainDir+H1Main+H1ProtectedDirectory+H1ProtectedLocation unparsed_uri=/protected/
index.html

And now normal programming is resumed:

Logger           : server=scuzzy.ben.algroup.co.uk:9001[79410] tag=MainServer+H1 
dir=(none)+(none) tag=MainDir+H1Main unparsed_uri=/

And finally, a request is created in anticipation of the next request on the same connection:

CreateRequest    : server=scuzzy.ben.algroup.co.uk:9001[79410] tag=MainServer+H1 
dir=(no per-dir config) tag=(no per-dir config) unparsed_uri=(null)

At this point, 2.0 is finished.

Rather than go on at length, here's the most complicated request we can make: http://host:9001/protected/.reveal:

CreateConnection : server=scuzzy.ben.algroup.co.uk:0[84997] tag=MainServer conn_id=0 [2.0]
PreConnection    : keepalive=0 double_reverse=0 [2.0]
ProcessConnection: keepalive=0 double_reverse=0 [2.0]
CreateRequest    : server=scuzzy.ben.algroup.co.uk:9001[84997] tag=MainServer+H1 
dir=(no per-dir config) tag=(no per-dir config) unparsed_uri=(null) [2.0]
PostReadReq : method=GET uri=/protected/.reveal protocol=HTTP/1.0
              server=freeby.ben.algroup.co.uk:9001[23288] tag=MainServer+H1
              dir=(none)+(none) tag=MainDir+H1Main
QuickHandler     : lookup_uri=0 server=scuzzy.ben.algroup.co.uk:9001[84997] tag=MainServer+H1 
dir=(none)+(none) tag=MainDir+H1Main unparsed_uri=/protected/.reveal [2.0]

After the post_read_request phase, some merging is done on the basis of location (1.3):

MergeDir    : pBase: dir=(none)+(none) tag=MainDir+H1Main pNew: dir=/.reveal
              tag=Revealer
MergeDir    : pBase: dir=(none)+(none)+/.reveal tag=MainDir+H1Main+Revealer
              pNew: dir=/protected tag=H1ProtectedLocation

Essentially the same thing happens in 2.0, but in a different order:

MergeDir         : pBase: dir=/.reveal tag=Revealer pNew: dir=/protected 
tag=H1ProtectedLocation
MergeDir         : pBase: dir=(none)+(none) tag=MainDir+H1Main pNew: dir=/.reveal+/protected 
tag=Revealer+H1ProtectedLocation

Of course, this illustrates the need to make sure your directory and server mergers behave sensibly despite ordering changes. Note that the end product of these two different ordering is, in fact, identical.

Then the URL is translated into a filename, using the newly merged directory configuration:

Translate     : uri=/protected/.reveal
                server=freeby.ben.algroup.co.uk:9001[23288] tag=MainServer+H1
                dir=(none)+(none)+/.reveal+/protected
                tag=MainDir+H1Main+Revealer+H1ProtectedLocation
MapToStorage  : server=scuzzy.ben.algroup.co.uk:9001[84997] tag=MainServer+H1 
                dir=(none)+(none) tag=MainDir+H1Main unparsed_uri=/protected/.reveal 
                [2.0]

Now that the filename is known, even more merging can be done. Notice that this time the section tagged as H1ProtectedDirectory is pulled in, too:

MergeDir    : pBase: dir=(none)+(none) tag=MainDir+H1Main pNew: dir=/home/
              ben/www/APACHE3/docs/protected tag=H1ProtectedDirectory
MergeDir    : pBase: dir=(none)+(none)+/home/ben/www/APACHE3/docs/protected
              tag=MainDir+H1Main+H1ProtectedDirectory pNew: dir=/.reveal
              tag=Revealer [1.3
MergeDir    : pBase: dir=(none)+(none)+/home/ben/www/APACHE3/docs/protected+/.reveal
              tag=MainDir+H1Main+H1ProtectedDirectory+Revealer pNew: dir=/
              protected tag=H1ProtectedLocation [1.3]
MergeDir    : pBase: dir=(none)+(none)+/home/ben/www5/docs/protected/ 
              tag=MainDir+H1Main+H1ProtectedDirectory pNew: dir=/.reveal+/protected 
              tag=Revealer+H1ProtectedLocation [2.0]

Note that 2.0 cunningly reuses an earlier merge and does the job in one less step.

And finally the request proceeds as usual:

HeaderParser  : server=freeby.ben.algroup.co.uk:9001[23288] tag=MainServer+H1
                dir=(none)+(none)+/home/ben/www/APACHE3/docs/protected+/.reveal+/
                protected tag=MainDir+H1Main+H1ProtectedDirectory+ 
                Revealer+H1ProtectedLocation
CheckAccess   : server=freeby.ben.algroup.co.uk:9001[23288] tag=MainServer+H1
                dir=(none)+(none)+/home/ben/www/APACHE3/docs/protected+/.reveal+/
                protected tag=MainDir+H1Main+H1ProtectedDirectory+
                Revealer+H1ProtectedLocation
TypeChecker   : server=freeby.ben.algroup.co.uk:9001[23288] tag=MainServer+H1
                dir=(none)+(none)+/home/ben/www/APACHE3/docs/protected+/.reveal+/
                protected tag=MainDir+H1Main+H1ProtectedDirectory+
                Revealer+H1ProtectedLocation
Fixups        : server=freeby.ben.algroup.co.uk:9001[23288] tag=MainServer+H1
                dir=(none)+(none)+/home/ben/www/APACHE3/docs/protected+/.reveal+/
                protected tag=MainDir+H1Main+H1ProtectedDirectory+
                Revealer+H1ProtectedLocation
InsertFilter  : server=scuzzy.ben.algroup.co.uk:9001[84997] tag=MainServer+H1 
                dir=(none)+(none)+/home/ben/www5/docs/protected/+/.reveal+/protected 
                tag=MainDir+H1Main+H1ProtectedDirectory+Revealer+H1ProtectedLocation 
                unparsed_uri=/protected/.reveal [2.0]
Logger        : server=freeby.ben.algroup.co.uk:9001[23288] tag=MainServer+H1
                dir=(none)+(none)+/home/ben/www/APACHE3/docs/protected+/.reveal+/
                protected tag=MainDir+H1Main+H1ProtectedDirectory+
                Revealer+H1ProtectedLocation
CreateRequest : server=scuzzy.ben.algroup.co.uk:9001[84997] tag=MainServer+H1
                dir=(no per-dir config) tag=(no per-dir config) unparsed_uri=(null)
                [2.0]

And there we have it. Although the merging of directories, locations, files, and so on gets rather hairy, Apache deals with it all for you, presenting you with a single server and directory configuration on which to base your code's decisions.

21.5 General Hints

Apache 2.0 may well be multithreaded (depending on the MPM in use), and, of course, the Win32 version always is. If you want your module to stand the test of time, you should avoid global variables, if at all possible. If not possible, put some thought into how they will be used by a multithreaded server. Don't forget that you can use the notes table in the request record to store any per-request data you may need to pass between hooks.

Never use a fixed-length buffer. Many of the security holes found in Internet software have fixed-length buffers at their root. The pool mechanism provides a rich set of tools you can use to avoid the need for fixed-length buffers.

Remember that your module is just one of a random set an Apache user may configure into his server. Don't rely on anything that may be peculiar to your own setup. And don't do anything that might interfere with other modules (a tall order, we know, but do your best!).

21.6 Porting to Apache 2.0

In addition to the earlier discussion on how to write a module from scratch for Apache 2.0, which is broadly the same as for 1.x, we'll show how to port one.

First of all, it is probably easiest to compile the module using apxs (although we are not keen on this approach, it is definitely the easiest, sadly). You'll need to have configured Apache like this:

./configure --enable-so

Then compiling mod_reveal is easy:

apxs -c mod_reveal.c

This will, once its working, yield .libs/mod_reveal.so (use the -i option, and apxs will obligingly install it in /usr/local/apache2/lib). However, compiling the Apache 1.x version of mod_reveal produces a large number of errors (note that you might save yourself some agony by adding -Wc,-Wall and -Wc,-Werror to the command line). The first problem is that some headers have been split up and moved around. So, we had to add:

#include "http_request.h"

to get the definition for server_rec.

Also, many data structures and functions in Apache 1.3 had names that could cause conflict with other libraries. So, they have all been prefixed in an attempt to make them unique. The prefixes are ap_, apr_, and apu_ depending on whether they belong to Apache, APR, or APR-util. If they are data structures, they typically have also had _t appended. So, pool has become apr_pool_t. Many functions have also moved from ap_ to apr_; for example, ap_pstrcat( ) has become apr_pstrcat( ) and now needs the header apr_strings.h.

Functions that didn't take pool arguments now do. For example:

ap_add_version_component("Reveal/0.0");

becomes:

ap_add_version_component(pPool,"Reveal/0.0");

The command structure is now typesafe and uses special macros for each type of command, depending on the number of parameters it takes. For example:

static command_rec aCommands[]=
    {
{ "RevealTag", RevealTag, NULL, ACCESS_CONF|OR_ALL, TAKE1, "a tag for this section"},
{ "RevealServerTag", RevealServerTag, NULL, RSRC_CONF, TAKE1, "a tag for this server" },
{ NULL }
    };

becomes:

static command_rec aCommands[]=
    {
    AP_INIT_TAKE1("RevealTag", RevealTag, NULL, ACCESS_CONF|OR_ALL,
                  "a tag for this section"),
    AP_INIT_TAKE1("RevealServerTag", RevealServerTag, NULL, RSRC_CONF,
                  "a tag for this server" ),
    { NULL }
    };

As a consequence of the type-safety, some fast and loose trickery we played is no longer acceptable. For example:

static const char *RevealServerTag(cmd_parms *cmd, SPerDir *pPerDir,
                                   char *arg)
    {

becomes:

static const char *RevealServerTag(cmd_parms *cmd, void *_pPerDir,
                                   const char *arg)
    {
    SPerDir *pPerDir=_pPerDir;

Handlers have changed completely and are now done via hooks. So, instead of:

static int RevealHandler(request_rec *pReq)
    {
    SPerDir *pPerDir=ap_get_module_config(pReq->per_dir_config,
               &reveal_module);
    SPerServer *pPerServer=ap_get_module_config(pReq->server->
               module_config,&reveal_module);
.
.
.
static handler_rec aHandlers[]=
    {
    { "reveal", RevealHandler },
    { NULL },
    };

we now have:

static int RevealHandler(request_rec *pReq)
    {
    SPerDir *pPerDir;
    SPerServer *pPerServer;

    if(strcmp(pReq->handler,"reveal"))
        return DECLINED;

    pPerDir=ap_get_module_config(pReq->per_dir_config, &reveal_module);
    pPerServer=ap_get_module_config(pReq->server->module_config, &reveal_module);
.
.
.

and an ap_hook_handler( ) entry in the RegisterHooks( ) function mentioned later in this section.

Obviously, we haven't covered all the API changes. But Apache 2.0 API, unlike the 1.x API, is thoroughly documented, both in the headers and, using the doxygen documentation tool, on the Web (and, of course, in the distribution). The web-based documentation for APR and APR-util can be found here: http://apr.apache.org/. Documentation for everything that's documented can also be generated by typing:

make dox

at the top of the httpd-2.0 tree, though at the time of writing you do have to tweak docs/doxygen.conf slightly by hand. Sadly, there is no better way, at the moment, to figure out API changes than to dredge through these. The grep utility is extremely useful.

Once the API changes have been dealt with, the next problem is to switch to the new hooking scheme. In 1.3, we had this:

 module reveal_module = {
   STANDARD_MODULE_STUFF,
   RevealInit,                  /* initializer */
   RevealCreateDir,             /* dir config creater */
   RevealMergeDir,              /* dir merger --- default is to override */
   RevealCreateServer,          /* server config */
   RevealMergeServer,           /* merge server configs */
   aCommands,                   /* command table */
   aHandlers,                   /* handlers */
   RevealTranslate,             /* filename translation */
   RevealCheckUserID,           /* check_user_id */
   RevealCheckAuth,             /* check auth */
   RevealCheckAccess,           /* check access */
   RevealTypeChecker,           /* type_checker */
   RevealFixups,                /* fixups */
   RevealLogger,                /* logger */
   RevealHeaderParser,          /* header parser */
   RevealChildInit,             /* child init */
   RevealChildExit,             /* child exit */
   RevealPostReadRequest,       /* post read request */
};

In 2.0, this gets a lot shorter, as all the hooks are now initialized in a single function. All this is explained in more detail in the previous chapter, but here's what this becomes:

static void RegisterHooks(apr_pool_t *pPool)
    {
    ap_hook_post_config(RevealInit,NULL,NULL,APR_HOOK_MIDDLE);
    ap_hook_handler(RevealHandler,NULL,NULL,APR_HOOK_MIDDLE);
    ap_hook_translate_name(RevealTranslate,NULL,NULL,APR_HOOK_MIDDLE);
    ap_hook_check_user_id(RevealCheckUserID,NULL,NULL,APR_HOOK_MIDDLE);
    ap_hook_auth_checker(RevealCheckAuth,NULL,NULL,APR_HOOK_MIDDLE);
    ap_hook_access_checker(RevealCheckAccess,NULL,NULL,APR_HOOK_MIDDLE);
    ap_hook_type_checker(RevealTypeChecker,NULL,NULL,APR_HOOK_MIDDLE);
    ap_hook_fixups(RevealFixups,NULL,NULL,APR_HOOK_MIDDLE);
    ap_hook_log_transaction(RevealLogger,NULL,NULL,APR_HOOK_MIDDLE);
    ap_hook_header_parser(RevealHeaderParser,NULL,NULL,APR_HOOK_MIDDLE);
    ap_hook_child_init(RevealChildInit,NULL,NULL,APR_HOOK_MIDDLE);
    ap_hook_post_read_request(RevealPostReadRequest,NULL,NULL,APR_HOOK_MIDDLE);
    }

 module reveal_module = {
   STANDARD20_MODULE_STUFF,
   RevealCreateDir,             /* dir config creater */
   RevealMergeDir,              /* dir merger --- default is to override */
   RevealCreateServer,          /* server config */
   RevealMergeServer,           /* merge server configs */
   aCommands,                   /* command table */
   RegisterHooks                /* hook registration */
};

One minor glitch this revealed was that:

static void RevealChildInit(server_rec *pServer,apr_pool_t *pPool)

should now be:

static void RevealChildInit(apr_pool_t *pPool,server_rec *pServer)

And rather more frighteningly:

static void RevealInit(server_rec *pServer,apr_pool_t *pPool)

becomes:

static int RevealInit(apr_pool_t *pPool,apr_pool_t *pLog,apr_pool_t *pTemp,
      server_rec *pServer)

returning a value of OK, which is fine in our case. Also note that we no longer have a child_exit hook — that can be done with a pool-cleanup function.

For this module at least, that's it! All that has to be done now is to load it with an appropriate AddModule:

LoadModule reveal_module .../mod_reveal.so

and it behaves just like the Apache 1.3 version.

[1]  For more on Apache modules, see Writing Apache Modules with Perl and C, by Lincoln Stein and Doug MacEachern (O'Reilly, 1999).

[2]  This means, of course, that one should not edit modules.c by hand. Rather, the Configuration file should be edited; see Chapter 1.

[3]  This is used, in theory, to adapt to old precompiled modules that used an earlier version of the API. We say "in theory"; because it is not used this way in practice.

[4]  The head of this list is top_module. This is occasionally useful to know. The list is actually set up at runtime.

[5]  This is a backward-compatibility feature.

[6]  In fact, some of this is done before the Translate Name phase, and some after, since the location information can be used before name translation is done, but filename information obviously cannot be. If you really want to know exactly what is going on, probe the behavior with mod_reveal.c.

[7]  Old hands may recall that earlier versions of Apache used "magic"; MIME types to cause certain request handlers to be invoked, such as the CGI handler. Handler strings were invented to remove this kludge.

[8]  It happened while we were writing the module because of a bug in the Apache core. We fixed the bug.

[9]  You could argue that this procedure could lead to an infinite sequence of reinitializations. Well, in theory, it could, but in real life, Apache initializes twice, and that is that.

CONTENTS