home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


CONTENTS

Chapter 20. The Apache API

Apache provides an Application Programming Interface (API) to modules to insulate them from the mechanics of the HTTP protocol and from each other. In this chapter, we explore the main concepts of the API and provide a detailed listing of the functions available to the module author.

In previous editions of this book, we described the Apache 1.x API. As you know, things have moved on since then, and Apache 2.x is upon us. The facilities in 2.x include some radical and exciting improvements over 1.x, and furthermore, 1.x has been frozen, apart from maintenance. So we decided that, unlike the rest of the book, we would document only the new API. (Appendix A provides some coverage of the 1.x API.)

Also, in previous editions, we had an API reference section. Because Apache 2.0 has substantially improved API documentation of its own, and because the API is still moving around as we write, we have decided to concentrate on the concepts and examples and refer you to the Web for the API reference. Part of the work we have done while writing this chapter is to help ensure that the online documentation does actually cover all the important APIs.

In this chapter, we will cover the important concepts needed to understand the API and point you to appropriate documentation. In the next chapter, we will illustrate the use of the API through a variety of example modules.

20.1 Documentation

In Apache 2.0 the Apache Group has gone to great lengths to try to document the API properly. Included in the headers is text that can by used to generate online documentation. Currently it expects to be processed by doxygen, a system similar to javadoc, only designed for use with C and C++. Doxygen can be found at http://www.stack.nl/~dimitri/doxygen/. Doxygen produces a variety of formats, but the only one we actively support is HTML. This format can be made simply by typing:

make dox

in the top Apache directory. The older target "docs" attempts to use scandoc instead of doxygen, but it doesn't work very well.

We do not reproduce information available in the online documentation here, but rather try to present a broader picture. We did consider including a copy of the documentation in the book, but decided against it because it is still changing quite frequently, and anyway it works much better as HTML documents than printed text.

20.2 APR

APR is the Apache Portable Runtime. This is a new library, used extensively in 2.0, that abstracts all the system-dependent parts of Apache. This includes file handling, sockets, pipes, threads, locking mechanisms (including file locking, interprocess locking, and interthread locking), and anything else that may vary according to platform.

Although APR is designed to fulfill Apache's needs, it is an entirely independent standalone library with its own development team. It can also be used in other projects that have nothing to do with Apache.

20.3 Pools

One of the most important thing to understand about the Apache API is the idea of a pool. This is a grouped collection of resources (i.e., file handles, memory, child programs, sockets, pipes, and so on) that are released when the pool is destroyed. Almost all resources used within Apache reside in pools, and their use should only be avoided after careful thought.

An interesting feature of pool resources is that many of them can be released only by destroying the pool. Pools may contain subpools, and subpools may contain subsubpools, and so on. When a pool is destroyed, all its subpools are destroyed with it.

Naturally enough, Apache creates a pool at startup, from which all other pools are derived. Configuration information is held in this pool (so it is destroyed and created anew when the server is restarted with a kill). The next level of pool is created for each connection Apache receives and is destroyed at the end of the connection. Since a connection can span several requests, a new pool is created (and destroyed) for each request. In the process of handling a request, various modules create their own pools, and some also create subrequests, which are pushed through the API machinery as if they were real requests. Each of these pools can be accessed through the corresponding structures (i.e., the connect structure, the request structure, and so on).

With this in mind, we can more clearly state when you should not use a pool: when the lifetime of the resource in question does not match the lifetime of a pool. If you need temporary storage (or files, etc.), you can create a subpool of an appropriate pool (the request pool is the most likely candidate) and destroy it when you are done, so lifetimes that are shorter than the pool's are easily handled. The only example we could think of where there was no appropriate pool in Apache 1.3 was the code for handling listeners (copy_listeners( ) and close_unused_listeners( ) in http_main.c), which had a lifetime longer than the topmost pool! However, the introduction in 2.x of pluggable process models has changed this: there is now an appropriate pool, the process pool, which lives in process_rec, which is documented in include/httpd.h.

All is not lost, however — Apache 2.0 gives us both a new example and a new excuse for not using pools. The excuse is where using a pool would cause either excessive memory consumption or excessive amounts of pool creation and destruction,[1] and the example is bucket brigades (or, more accurately, buckets), which are documented later.

There are a number of advantages to the pool approach, the most obvious being that modules can use resources without having to worry about when and how to release them. This is particularly useful when Apache handles an error condition. It simply bails out, destroying the pool associated with the erroneous request, confident that everything will be neatly cleaned up. Since each instance of Apache may handle many requests, this functionality is vital to the reliability of the server. Unsurprisingly, pools come into almost every aspect of Apache's API, as we shall see in this chapter. Their type is apr_pool_t, defined in srclib/apr/include/apr_pools.h .

Like many other aspects of Apache, pools are configurable, in the sense that you can add your own resource management to a pool, mainly by registering cleanup functions (see the pool API in srclib/apr/include/apr_pools.h).

20.4 Per-Server Configuration

Since a single instance of Apache may be called on to handle a request for any of the configured virtual hosts (or the main host), a structure is defined that holds the information related to each host. This structure, server_rec, is defined in include/httpd.h:

struct server_rec {
    /** The process this server is running in */
    process_rec *process;
    /** The next server in the list */
    server_rec *next;

    /** The name of the server */
    const char *defn_name;
    /** The line of the config file that the server was defined on */
    unsigned defn_line_number;

    /* Contact information */

    /** The admin's contact information */
    char *server_admin;
    /** The server hostname */
    char *server_hostname;
    /** for redirects, etc. */
    apr_port_t port;

    /* Log files --- note that transfer log is now in the modules... */

    /** The name of the error log */
    char *error_fname;
    /** A file descriptor that references the error log */
    apr_file_t *error_log;
    /** The log level for this server */
    int loglevel;

    /* Module-specific configuration for server, and defaults... */

    /** true if this is the virtual server */
    int is_virtual;
    /** Config vector containing pointers to modules' per-server config 
     *  structures. */
    struct ap_conf_vector_t *module_config; 
    /** MIME type info, etc., before we start checking per-directory info */
    struct ap_conf_vector_t *lookup_defaults;

    /* Transaction handling */

    /** I haven't got a clue */
    server_addr_rec *addrs;
    /** Timeout, in seconds, before we give up */
    int timeout;
    /** Seconds we'll wait for another request */
    int keep_alive_timeout;
    /** Maximum requests per connection */
    int keep_alive_max;
    /** Use persistent connections? */
    int keep_alive;

    /** Pathname for ServerPath */
    const char *path;
    /** Length of path */
    int pathlen;

    /** Normal names for ServerAlias servers */
    apr_array_header_t *names;
    /** Wildcarded names for ServerAlias servers */
    apr_array_header_t *wild_names;

    /** limit on size of the HTTP request line    */
    int limit_req_line;
    /** limit on size of any request header field */
    int limit_req_fieldsize;
    /** limit on number of request header fields  */
    int limit_req_fields; 
};

Most of this structure is used by the Apache core, but each module can also have a per-server configuration, which is accessed via the module_config member, using ap_get_module_config( ). Each module creates this per-module configuration structure itself, so it has complete control over its size and contents. This can be seen in action in the case filter example that follows. Here are excerpts from modules/experimental/mod_case_filter.c showing how it is used:

typedef struct
    {
    int bEnabled;
    } CaseFilterConfig;

Here we define a structure to hold the per-server configuration. Obviously, a module can put whatever it likes in this structure:

static void *CaseFilterCreateServerConfig(apr_pool_t *p,server_rec *s)
    {
    CaseFilterConfig *pConfig=apr_pcalloc(p,sizeof *pConfig);

    pConfig->bEnabled=0;

    return pConfig;
    }

This function is linked in the module structure (see later) in the create_server_config slot. It is called once for each server (i.e., a virtual host or main host) by the core. The function must allocate the storage for the per-server configuration and initialize it. (Note that because apr_pcalloc( ) zero-fills the memory it allocates, there's no need to actually initialize the structure, but it is done for the purpose of clarity.) The return value must be the per-server configuration structure:

static const char *CaseFilterEnable(cmd_parms *cmd, void *dummy, int arg)
    {
    CaseFilterConfig *pConfig=ap_get_module_config(cmd->server->module_config,
                                                   &case_filter_module);
    pConfig->bEnabled=arg;

    return NULL;
    }

This function sets the flag in the per-server configuration structure, having first retrieved it using ap_get_module_config( ). Note that you have to pass the right thing as the first argument, i.e., the module_config element of the server structure. The second argument is the address of the module's module structure, which is used to work out which configuration to retrieve. Note that per-directory configuration is done differently:

static const command_rec CaseFilterCmds[] = 
    {
    AP_INIT_FLAG("CaseFilter", CaseFilterEnable, NULL, RSRC_CONF,
                 "Run a case filter on this host"),
    { NULL }
    };

This command invokes the function CaseFilterEnable( ). The RSRC_CONF flag is what tells the core that it is a per-server command (see the include/httpd_config.h documentation for more information).

To access the configuration at runtime, all that is needed is a pointer to the relevant server structure, as shown earlier. This can usually be obtained from the request, as seen in this example:

static void CaseFilterInsertFilter(request_rec *r)
    {
    CaseFilterConfig *pConfig=ap_get_module_config(r->server->module_config,
                                                   &case_filter_module);

    if(!pConfig->bEnabled)
        return;

    ap_add_output_filter(s_szCaseFilterName,NULL,r,r->connection);
    }

One subtlety that isn't needed by every module is configuration merging. This occurs when the main configuration has directives for a module, but so has the relevant virtual host section. Then the two are merged. The default way this is done is for the virtual host to simply override the main config, but it is possible to supply a merging function in the module structure. If you do, then the two configs are passed to it, and it creates a new config that is the two merged. How it does this is entirely up to you, but here's an example from modules/metadata/mod_headers.c:

static void *merge_headers_config(apr_pool_t *p, void *basev, void *overridesv)
{
    headers_conf *newconf = apr_pcalloc(p, sizeof(*newconf));
    headers_conf *base = basev;
    headers_conf *overrides = overridesv;

    newconf->fixup_in = apr_array_append(p, base->fixup_in, overrides->fixup_in);
    newconf->fixup_out = apr_array_append(p, base->fixup_out, overrides->fixup_out);

    return newconf;
}

In this case the merging is done by combining the two sets of configuration (which are stored in a standard APR array).

20.5 Per-Directory Configuration

It is also possible for modules to be configured on a per-directory, per-URL, or per-file basis. Again, each module optionally creates its own per-directory configuration (the same structure is used for all three cases). This configuration is made available to modules either directly (during configuration) or indirectly (once the server is running), through the request_rec structure, which is detailed in the next section.

Note that the module doesn't care how the configuration has been set up in terms of servers, directories, URLs, or file matches — the core of the server works out the appropriate configuration for the current request before modules are called by merging the appropriate set of configurations.

The method differs from per-server configuration, so here's an example, taken this time from the standard module, modules/metadata/mod_expires.c:

typedef struct {
    int active;
    char *expiresdefault;
    apr_table_t *expiresbytype;
} expires_dir_config;

First we have a per-directory configuration structure:

static void *create_dir_expires_config(apr_pool_t *p, char *dummy)
{
    expires_dir_config *new =
    (expires_dir_config *) apr_pcalloc(p, sizeof(expires_dir_config));
    new->active = ACTIVE_DONTCARE;
    new->expiresdefault = "";
    new->expiresbytype = apr_table_make(p, 4);
    return (void *) new;
}

This is the function that creates it, which will be linked from the module structure, as usual. Note that the active member is set to a default that can't be set by directives — this is used later on in the merging function.

static const char *set_expiresactive(cmd_parms *cmd, void *in_dir_config, int arg)
{
    expires_dir_config *dir_config = in_dir_config;

    /* if we're here at all it's because someone explicitly
     * set the active flag
     */
    dir_config->active = ACTIVE_ON;
    if (arg == 0) {
        dir_config->active = ACTIVE_OFF;
    };
    return NULL;
}
static const char *set_expiresbytype(cmd_parms *cmd, void *in_dir_config,
                                     const char *mime, const char *code)
{
    expires_dir_config *dir_config = in_dir_config;
    char *response, *real_code;

    if ((response = check_code(cmd->pool, code, &real_code)) == NULL) {
        apr_table_setn(dir_config->expiresbytype, mime, real_code);
        return NULL;
    };
    return apr_pstrcat(cmd->pool,
                 "'ExpiresByType ", mime, " ", code, "': ", response, NULL);
}

static const char *set_expiresdefault(cmd_parms *cmd, void *in_dir_config,
                                      const char *code)
{
    expires_dir_config * dir_config = in_dir_config;
    char *response, *real_code;

    if ((response = check_code(cmd->pool, code, &real_code)) == NULL) {
        dir_config->expiresdefault = real_code;
        return NULL;
    };
    return apr_pstrcat(cmd->pool,
                   "'ExpiresDefault ", code, "': ", response, NULL);
}

static const command_rec expires_cmds[] =
{
    AP_INIT_FLAG("ExpiresActive", set_expiresactive, NULL, DIR_CMD_PERMS,
                 "Limited to 'on' or 'off'"),
    AP_INIT_TAKE2("ExpiresBytype", set_expiresbytype, NULL, DIR_CMD_PERMS,
                  "a MIME type followed by an expiry date code"),
    AP_INIT_TAKE1("ExpiresDefault", set_expiresdefault, NULL, DIR_CMD_PERMS,
                  "an expiry date code"),
    {NULL}
};

This sets the various options — nothing particularly out of the ordinary there — but note a few features. First, we've omitted the function check_code( ), which does some complicated stuff we don't really care about here. Second, unlike per-server config, we don't have to find the config ourselves. It is passed to us as the second argument of each function — the DIR_CMD_PERMS (which is #defined earlier to be OR_INDEX) is what tells the core it is per-directory and triggers this behavior:

static void *merge_expires_dir_configs(apr_pool_t *p, void *basev, void *addv)
{
    expires_dir_config *new = (expires_dir_config *) apr_pcalloc(p, sizeof(expires_
dir_config));
    expires_dir_config *base = (expires_dir_config *) basev;
    expires_dir_config *add = (expires_dir_config *) addv;

    if (add->active == ACTIVE_DONTCARE) {
        new->active = base->active;
    }
    else {
        new->active = add->active;
    };

    if (add->expiresdefault[0] != '\0') {
        new->expiresdefault = add->expiresdefault;
    }
    else {
	new->expiresdefault = base->expiresdefault;
    }

    new->expiresbytype = apr_table_overlay(p, add->expiresbytype,
                                        base->expiresbytype);
    return new;
}

Here we have a more complex example of a merging function — the active member is set by the overriding config (here called addv) if it was set there at all, or it comes from the base. expiresdefault is set similarly but expiresbytype is the combination of the two sets:

static int add_expires(request_rec *r)
{
    expires_dir_config *conf;
...
    conf = (expires_dir_config *) 
           ap_get_module_config(r->per_dir_config, &expires_module);

This code snippet shows how the configuration is found during request processing:

static void register_hooks(apr_pool_t *p)
{
    ap_hook_fixups(add_expires,NULL,NULL,APR_HOOK_MIDDLE);
}

module AP_MODULE_DECLARE_DATA expires_module =
{
    STANDARD20_MODULE_STUFF,
    create_dir_expires_config,  /* dir config creater */
    merge_expires_dir_configs,  /* dir merger --- default is to override */
    NULL,                       /* server config */
    NULL,                       /* merge server configs */
    expires_cmds,               /* command apr_table_t */
    register_hooks		/* register hooks */
};

Finally, the hook registration function and module structure link everything together.

20.6 Per-Request Information

The core ensures that the right information is available to the modules at the right time. It does so by matching requests to the appropriate virtual server and directory information before invoking the various functions in the modules. This, and other information, is packaged in a request_rec structure, defined in httpd.h:

/** A structure that represents the current request */
struct request_rec {
    /** The pool associated with the request */
    apr_pool_t *pool;
    /** The connection over which this connection has been read */
    conn_rec *connection;
    /** The virtual host this request is for */
    server_rec *server;

    /** If we wind up getting redirected, pointer to the request we 
     *  redirected to.  */
    request_rec *next;
    /** If this is an internal redirect, pointer to where we redirected 
     *  *from*.  */
    request_rec *prev;

    /** If this is a sub_request (see request.h) pointer back to the 
     *  main request.  */
    request_rec *main;

    /* Info about the request itself... we begin with stuff that only
     * protocol.c should ever touch...
     */
    /** First line of request, so we can log it */
    char *the_request;
    /** HTTP/0.9, "simple" request */
    int assbackwards;
    /** A proxy request (calculated during post_read_request/translate_name)
     *  possible values PROXYREQ_NONE, PROXYREQ_PROXY, PROXYREQ_REVERSE
     */
    int proxyreq;
    /** HEAD request, as opposed to GET */
    int header_only;
    /** Protocol, as given to us, or HTTP/0.9 */
    char *protocol;
    /** Number version of protocol; 1.1 = 1001 */
    int proto_num;
    /** Host, as set by full URI or Host: */
    const char *hostname;

    /** When the request started */
    apr_time_t request_time;

    /** Status line, if set by script */
    const char *status_line;
    /** In any case */
    int status;

    /* Request method, two ways; also, protocol, etc..  Outside of protocol.c,
     * look, but don't touch.
     */

    /** GET, HEAD, POST, etc. */
    const char *method;
    /** M_GET, M_POST, etc. */
    int method_number;

    /**
     *  allowed is a bitvector of the allowed methods.
     *
     *  A handler must ensure that the request method is one that
     *  it is capable of handling.  Generally modules should DECLINE
     *  any request methods they do not handle.  Prior to aborting the
     *  handler like this the handler should set r->allowed to the list
     *  of methods that it is willing to handle.  This bitvector is used
     *  to construct the "Allow:" header required for OPTIONS requests,
     *  and HTTP_METHOD_NOT_ALLOWED and HTTP_NOT_IMPLEMENTED status codes.
     *
     *  Since the default_handler deals with OPTIONS, all modules can
     *  usually decline to deal with OPTIONS.  TRACE is always allowed,
     *  modules don't need to set it explicitly.
     *
     *  Since the default_handler will always handle a GET, a
     *  module which does *not* implement GET should probably return
     *  HTTP_METHOD_NOT_ALLOWED.  Unfortunately this means that a Script GET
     *  handler can't be installed by mod_actions.
     */
    int allowed;
    /** Array of extension methods */
    apr_array_header_t *allowed_xmethods; 
    /** List of allowed methods */
    ap_method_list_t *allowed_methods; 

    /** byte count in stream is for body */
    int sent_bodyct;
    /** body byte count, for easy access */
    long bytes_sent;
    /** Time the resource was last modified */
    apr_time_t mtime;

    /* HTTP/1.1 connection-level features */

    /** sending chunked transfer-coding */
    int chunked;
    /** multipart/byteranges boundary */
    const char *boundary;
    /** The Range: header */
    const char *range;
    /** The "real" content length */
    apr_off_t clength;

    /** bytes left to read */
    apr_size_t remaining;
    /** bytes that have been read */
    long read_length;
    /** how the request body should be read */
    int read_body;
    /** reading chunked transfer-coding */
    int read_chunked;
    /** is client waiting for a 100 response? */
    unsigned expecting_100;

    /* MIME header environments, in and out.  Also, an array containing
     * environment variables to be passed to subprocesses, so people can
     * write modules to add to that environment.
     *
     * The difference between headers_out and err_headers_out is that the
     * latter are printed even on error, and persist across internal redirects
     * (so the headers printed for ErrorDocument handlers will have them).
     *
     * The 'notes' apr_table_t is for notes from one module to another, with no
     * other set purpose in mind...
     */

    /** MIME header environment from the request */
    apr_table_t *headers_in;
    /** MIME header environment for the response */
    apr_table_t *headers_out;
    /** MIME header environment for the response, printed even on errors and
     * persist across internal redirects */
    apr_table_t *err_headers_out;
    /** Array of environment variables to be used for sub processes */
    apr_table_t *subprocess_env;
    /** Notes from one module to another */
    apr_table_t *notes;

    /* content_type, handler, content_encoding, content_language, and all
     * content_languages MUST be lowercased strings.  They may be pointers
     * to static strings; they should not be modified in place.
     */
    /** The content-type for the current request */
    const char *content_type;	/* Break these out --- we dispatch on 'em */
    /** The handler string that we use to call a handler function */
    const char *handler;	/* What we *really* dispatch on           */

    /** How to encode the data */
    const char *content_encoding;
    /** for back-compat. only -- do not use */
    const char *content_language;
    /** array of (char*) representing the content languages */
    apr_array_header_t *content_languages;

    /** variant list validator (if negotiated) */
    char *vlist_validator;
    
    /** If an authentication check was made, this gets set to the user name. */
    char *user;	
    /** If an authentication check was made, this gets set to the auth type. */
    char *ap_auth_type;

    /** This response is non-cache-able */
    int no_cache;
    /** There is no local copy of this response */
    int no_local_copy;

    /* What object is being requested (either directly, or via include
     * or content-negotiation mapping).
     */

    /** the uri without any parsing performed */
    char *unparsed_uri;	
    /** the path portion of the URI */
    char *uri;
    /** The filename on disk that this response corresponds to */
    char *filename;
    /** The path_info for this request if there is any. */
    char *path_info;
    /** QUERY_ARGS, if any */
    char *args;	
    /** ST_MODE set to zero if no such file */
    apr_finfo_t finfo;
    /** components of uri, dismantled */
    apr_uri_components parsed_uri;

    /* Various other config info which may change with .htaccess files
     * These are config vectors, with one void* pointer for each module
     * (the thing pointed to being the module's business).
     */

    /** Options set in config files, etc. */
    struct ap_conf_vector_t *per_dir_config;
    /** Notes on *this* request */
    struct ap_conf_vector_t *request_config;

/**
 * a linked list of the configuration directives in the .htaccess files
 * accessed by this request.
 * N.B. always add to the head of the list, _never_ to the end.
 * that way, a sub request's list can (temporarily) point to a parent's list
 */
    const struct htaccess_result *htaccess;

    /** A list of output filters to be used for this request */
    struct ap_filter_t *output_filters;
    /** A list of input filters to be used for this request */
    struct ap_filter_t *input_filters;
    /** A flag to determine if the eos bucket has been sent yet */
    int eos_sent;

/* Things placed at the end of the record to avoid breaking binary
 * compatibility.  It would be nice to remember to reorder the entire
 * record to improve 64bit alignment the next time we need to break
 * binary compatibility for 

some other reason.
 */
};

20.7 Access to Configuration and Request Information

All this sounds horribly complicated, and, to be honest, it is. But unless you plan to mess around with the guts of Apache (which this book does not encourage you to do), all you really need to know is that these structures exist and that your module can access them at the appropriate moments. Each function exported by a module gets access to the appropriate structure to enable it to function. The appropriate structure depends on the function, of course, but it is typically either a server_rec, the module's per-directory configuration structure (or two), or a request_rec. As we saw earlier, if you have a server_rec, you can get access to your per-server configuration, and if you have a request_rec, you can get access to both your per-server and your per-directory configurations.

20.8 Hooks, Optional Hooks, and Optional Functions

In Apache 1.x modules hooked into the appropriate "phases" of the main server by putting functions into appropriate slots in the module structure. This process is known as "hooking." This has been revised in Apache 2.0 — instead a single function is called at startup in each module, and this registers the functions that need to be called. The registration process also permits the module to specify how it should be ordered relative to other modules for each hook. (In Apache 1.x this was only possible for all hooks in a module instead of individually and also had to be done in the configuration file, rather than being done by the module itself.)

This approach has various advantages. First, the list of hooks can be extended arbitrarily without causing each function to have a huge unwieldy list of NULL entries. Second, optional modules can export their own hooks, which are only invoked when the module is present, but can be registered regardless — and this can be done without modification of the core code.

Another feature of hooks that we think is pretty cool is that, although they are dynamic, they are still typesafe — that is, the compiler will complain if the type of the function registered for a hook doesn't match the hook (and each hook can use a different type of function).[2] They are also extremely efficient.

So, what exactly is a hook? Its a point at which a module can request to be called. So, each hook specifies a function prototype, and each module can specify one (or more in 2.0) function that gets called at the appropriate moment. When the moment arrives, the provider of the hook calls all the functions in order.[3] It may terminate when particular values are returned — the hook functions can return either "declined" or "ok" or an error. In the first case all are called until an error is returned (if one is, of course); in the second, functions are called until either an error or "ok" is returned. A slight complication in Apache 2.0 is that because each hook function can define the return type, it must also define how "ok," "decline," and errors are returned (in 1.x, the return type was fixed, so this was easier).

Although you are unlikely to want to define a hook, it is useful to know how to go about it, so you can understand them when you come across them (plus, advanced module writers may wish to define optional hooks or optional functions).

Before we get started, it is worth noting that Apache hooks are defined in terms of APR hooks — but the only reason for that is to provide namespace separation between Apache and some other package linked into Apache that also uses hooks.

20.8.1 Hooks

A hook comes in five parts: a declaration (in a header, of course), a hook structure, an implementation (where the hooked functions get called), a call to the implementation, and a hooked function. The first four parts are all provided by the author of the hook, and the last by its user. They are documented in .../include/ap_config.h. Let's cover them in order. First, the declaration. This consists of the return type, the name of the hook, and an argument list. Notionally, it's just a function declaration with commas in strange places. So, for example, if a hook is going to a call a function that looks like:

int some_hook(int,char *,struct x);

then the hook would be declared like this:

AP_DECLARE_HOOK(int,some_hook,(int,char *,struct x))

Note that you really do have to put brackets around the arguments (even if there's only one) and no semicolon at the end (there's only so much we can do with macros!). This declares everything a module using a hook needs, and so it would normally live in an appropriate header.

The next thing you need is the hook structure. This is really just a place that the hook machinery uses to store stuff. You only need one for a module that provides hooks, even if it provides more than one hook. In the hook structure you provide a link for each hook:

APR_HOOK_STRUCT(
	APR_HOOK_LINK(some_hook)
	APR_HOOK_LINK(some_other_hook)
)

Once you have the declaration and the hook structure, you need an implementation for the hook — this calls all the functions registered for the hook and handles their return values. The implementation is actually provided for you by a macro, so all you have to do is invoke the macro somewhere in your source (it can't be implemented generically because each hook can have different arguments and return types). Currently, there are three different ways a hook can be implemented — all of them, however, implement a function called ap_run_name( ). If it returns no value (i.e., it is a void function), then implement it as follows:

AP_IMPLEMENT_HOOK_VOID(some_hook,(char *a,int b),(a,b))

The first argument is the name of the hook, and the second is the declaration of the hook's arguments. The third is how those arguments are used to call a function (that is, the hook function looks like void some_hook(char *a,int b) and calling it looks like some_hook(a,b)). This implementation will call all functions registered for the hook.

If the hook returns a value, there are two variants on the implementation — one calls all functions until one returns something other than "ok" or "decline" (returning something else normally signifies an error, which is why we stop at that point). The second runs functions until one of them returns something other than "decline." Note that the actual values of "ok" and "decline" are defined by the implementor and will, of course, have values appropriate to the return type of the hook. Most functions return ints and use the standard values OK and DECLINE as their return values. Many return an HTTP error value if they have an error. An example of the first variant is as follows:

AP_IMPLEMENT_HOOK_RUN_ALL(int,some_hook,(int x),(x),OK,DECLINE)

The arguments are, respectively, the return type of the hook, the hook's name, the arguments it takes, the way the arguments are used in a function call, the "ok" value, and the "decline" value. By the way, the reason this is described as "run all" rather than "run until the first thing that does something other than OK or DECLINE" is that the normal (i.e., nonerror) case will run all the registered functions.

The second variant looks like this:

AP_IMPLEMENT_HOOK_RUN_FIRST(char *,some_hook,(int k,const char *s),(k,s),NULL)

The arguments are the return type of the hook, the hook name, the hook's arguments, the way the arguments are used, and the "decline" value.

The final part is the way you register a function to be called by the hook. The declaration of the hook defines a function that does the registration, called ap_hook_name( ). This is normally called by a module from its hook-registration function, which, in turn, is pointed at by an element of the module structure. This function always takes four arguments, as follows:

ap_hook_some_hook(my_hook_function,pre,succ,APR_HOOK_MIDDLE);

Note that since this is not a macro, it actually has a semicolon at the end! The first argument is the function the module wants called by the hook. One of the pieces of magic that the hook implementation does is to ensure that the compiler knows the type of this function, so if it has the wrong arguments or return type, you should get an error. The second and third arguments are NULL-terminated arrays of module names that must precede or follow (respectively) this module in the order of registered hook functions. This is to provide fine-grained control of execution order (which, in Apache 1.x could only be done in a very ham-fisted way). If there are no such constraints, then NULL can be passed instead of a pointer to an empty array. The final argument provides a coarser mechanism for ordering — the possibilities being APR_HOOK_FIRST, APR_HOOK_MIDDLE, and APR_HOOK_LAST. Most modules should use APR_HOOK_MIDDLE. Note that this ordering is always overridden by the finer-grained mechanism provided by pre and succ.

You might wonder what kind of hooks are available. Well, a list can be created by running the Perl script .../support/list_hooks.pl. Each hook should be documented in the online Apache documentation.

20.8.2 Optional Hooks

Optional hooks are almost exactly like standard hooks, except that they have the property that they do not actually have to be implemented — that sounds a little confusing, so let's start with what optional hooks are used for, and all will be clear. Consider an optional module — it may want to export a hook, but what happens if some other module uses that hook and the one that exports it is not present? With a standard hook Apache would just fail to build. Optional hooks allow you to export hooks that may not actually be there at runtime. Modules that use the hooks work fine even when the hook isn't there — they simply don't get called. There is a small runtime penalty incurred by optional hooks, which is the main reason all hooks are not optional.

An optional hook is declared in exactly the same way as a standard hook, using AP_DECLARE_HOOK as shown earlier.

There is no hook structure at all; it is maintained dynamically by the core. This is less efficient than maintaining the structure, but is required to make the hooks optional.

The implementation differs from a standard hook implementation, but only slightly — instead of using AP_IMPLEMENT_HOOK_RUN_ALL and friends, you use AP_IMPLEMENT_OPTIONAL_HOOK_RUN_ALL and so on.

Registering to use an optional hook is again almost identical to a standard hook, except you use a macro to do it: instead of ap_hook_name(...) you use AP_OPTIONAL_HOOK(name,...). Again, this is because of their dynamic nature.

The call to your hook function from an optional hook is the same as from a standard one — except that it may not happen at all, of course!

20.8.3 Optional Hook Example

Here's a complete example of an optional hook (with comments following after the lines to which they refer). This can be found in .../modules/experimental. It comprises three files, mod_optional_hook_export.h, mod_optional_hook_export.c, and mod_optional_hook_import.c. What it actually does is call the hook, at logging time, with the request string as an argument.

First we start with the header, mod_optional_hook_export.h.

#include "ap_config.h"

This header declares the various macros needed for hooks.

AP_DECLARE_HOOK(int,optional_hook_test,(const char *))

Declare the optional hook (i.e., a function that looks like int optional_hook_test(const char *)). And that's all that's needed in the header.

Next is the implementation file, mod_optional_hook_export.c.

#include "httpd.h"
#include "http_config.h"
#include "mod_optional_hook_export.h"
#include "http_protocol.h"

Start with the standard includes — but we also include our own declaration header (although this is always a good idea, in this case it is a requirement, or other things won't work).

AP_IMPLEMENT_OPTIONAL_HOOK_RUN_ALL(int,optional_hook_test,(const char *szStr),
                                   (szStr),OK,DECLINED)

Then we go to the implementation of the optional hook — in this case it makes sense to call all the hooked functions, since the hook we are implementing is essentially a logging hook. We could have declared it void, but even logging can go wrong, so we give the opportunity to say so.

static int ExportLogTransaction(request_rec *r)
{
    return ap_run_optional_hook_test(r->the_request);
}

This is the function that will actually run the hook implementation, passing the request string as its argument.

static void ExportRegisterHooks(apr_pool_t *p)
{
    ap_hook_log_transaction(ExportLogTransaction,NULL,NULL,APR_HOOK_MIDDLE);
}

Here we hook the log_transaction hook to get hold of the request string in the logging phase (this is, of course, an example of the use of a standard hook).

module optional_hook_export_module =
{
    STANDARD20_MODULE_STUFF,
    NULL,
    NULL,
    NULL,
    NULL,
    NULL,
    ExportRegisterHooks
};

Finally, the module structure — the only thing we do in this module structure is to add hook registration.

Finally, an example module that uses the optional hook, optional_hook_import.c.

#include "httpd.h"
#include "http_config.h"
#include "http_log.h"
#include "mod_optional_hook_export.h"

Again, the standard stuff, but also the optional hooks declaration (note that you always have to have the code available for the optional hook, or at least its header, to build with).

static int ImportOptionalHookTestHook(const char *szStr)
{
    ap_log_error(APLOG_MARK,APLOG_ERR,OK,NULL,"Optional hook test said: %s",
                 szStr);

    return OK;
}

This is the function that gets called by the hook. Since this is just a test, we simply log whatever we're given. If optional_hook_export.c isn't linked in, then we'll log nothing, of course.

static void ImportRegisterHooks(apr_pool_t *p)
{
    AP_OPTIONAL_HOOK(optional_hook_test,ImportOptionalHookTestHook,NULL,
                     NULL,APR_HOOK_MIDDLE);
}

Here's where we register our function with the optional hook.

module optional_hook_import_module=
{
    STANDARD20_MODULE_STUFF,
    NULL,
    NULL,
    NULL,
    NULL,
    NULL,
    ImportRegisterHooks
};

And finally, the module structure, once more with only the hook registration function in it.

20.8.4 Optional Functions

For much the same reason as optional hooks are desirable, it is also nice to be able to call a function that may not be there. You might think that DSOs provide the answer,[4] and you'd be half right. But they don't quite, for two reasons — first, not every platform supports DSOs, and second, when the function is not missing, it may be statically linked. Forcing everyone to use DSOs for all modules just to support optional functions is going too far. Particularly since we have a better plan!

An optional function is pretty much what it sounds like. It is a function that may turn out, at runtime, not to be implemented (or not to exist at all, more to the point). So, there are five parts to an optional function: a declaration, an implementation, a registration, a retrieval, and a call. The export of the optional function declares it:

APR_DECLARE_OPTIONAL_FN(int,some_fn,(const char *thing))

This is pretty much like a hook declaration: you have the return type, the name of the function, and the argument declaration. Like a hook declaration, it would normally appear in a header.

Next it has to be implemented:

int some_fn(const char *thing)
{
    /* do stuff */
}

Note that the function name must be the same as in the declaration.

The next step is to register the function (note that optional functions are a bit like optional hooks in a distorting mirror — some parts switch role from the exporter of the function to the importer, and this is one of them):

APR_REGISTER_OPTIONAL_FN(some_fn);

Again, the function name must be the same as the declaration. This is normally called in the hook registration process.[5]

Next, the user of the function must retrieve it. Because it is registered during hook registration, it can't be reliably retrieved at that point. However, there is a hook for retrieving optional functions (called, obviously enough, optional_fn_retrieve). Or it can be done by keeping a flag that says whether it has been retrieved and retrieving it when it is needed. (Although it is tempting to use the pointer to function as the flag, it is a bad idea — if it is not registered, then you will attempt to retrieve it every time instead of just once). In either case, the actual retrieval looks like this:

APR_OPTIONAL_FN_TYPE(some_fn) *pfn;

pfn=APR_RETRIEVE_OPTIONAL_FN(some_fn);

From there on in, pfn gets used just like any other pointer to a function. Remember that it may be NULL, of course!

20.8.5 Optional Function Example

As with optional hooks, this example consists of three files which can be found in .../modules/experimental: mod_optional_fn_export.c, mod_optional_fn_export.h and mod_optional_fn_import.c. (Note that comments for this example follow the code line(s) to which they refer.)

First the header, mod_optional_fn_export.h:

#include "apr_optional.h"

Get the optional function support from APR.

APR_DECLARE_OPTIONAL_FN(int,TestOptionalFn,(const char *));

And declare our optional function, which really looks like int TestOptionalFn(const char *).

Now the exporting file, mod_optional_fn_export.c:

#include "httpd.h"
#include "http_config.h"
#include "http_log.h"
#include "mod_optional_fn_export.h"

As always, we start with the headers, including our own.

static int TestOptionalFn(const char *szStr)
{
    ap_log_error(APLOG_MARK,APLOG_ERR,OK,NULL,
                 "Optional function test said: %s",szStr);

    return OK;
}

This is the optional function — all it does is log the fact that it was called.

static void ExportRegisterHooks(apr_pool_t *p)
{
    APR_REGISTER_OPTIONAL_FN(TestOptionalFn);
}

During hook registration we register the optional function.

module optional_fn_export_module=
{
    STANDARD20_MODULE_STUFF,
    NULL,
    NULL,
    NULL,
    NULL,
    NULL,
    ExportRegisterHooks
};

And finally, we see the module structure containing just the hook registration function.

Now the module that uses the optional function, mod_optional_fn_import.c:

#include "httpd.h"
#include "http_config.h"
#include "mod_optional_fn_export.h"
#include "http_protocol.h"

These are the headers. Of course, we have to include the header that declares the optional function.

static APR_OPTIONAL_FN_TYPE(TestOptionalFn) *pfn;

We declare a pointer to the optional function — note that the macro APR_OPTIONAL_FN_TYPE gets us the type of the function from its name.

static int ImportLogTransaction(request_rec *r)
{
    if(pfn)
        return pfn(r->the_request);
    return DECLINED;
}

Further down we will hook the log_transaction hook, and when it gets called we'll then call the optional function — but only if its present, of course!

static void ImportFnRetrieve(void)
{
    pfn=APR_RETRIEVE_OPTIONAL_FN(TestOptionalFn);
}

We retrieve the function here — this function is called by the optional_fn_retrieve hook (also registered later), which happens at the earliest possible moment after hook registration.

static void ImportRegisterHooks(apr_pool_t *p)
{
    ap_hook_log_transaction(ImportLogTransaction,NULL,NULL,APR_HOOK_MIDDLE);
    ap_hook_optional_fn_retrieve(ImportFnRetrieve,NULL,NULL,APR_HOOK_MIDDLE);
}

And here's where we register our hooks.

module optional_fn_import_module =
{
    STANDARD20_MODULE_STUFF,
    NULL,
    NULL,
    NULL,
    NULL,
    NULL,
    ImportRegisterHooks
};

And, once more, the familiar module structure.

20.9 Filters, Buckets, and Bucket Brigades

A new feature of Apache 2.0 is the ability to create filters, as described in Chapter 6. These are modules (or parts of modules) that modify the output or input of other modules in some way. Over the course of Apache's development, it has often been said that these could only be done in a threaded server, because then you can make the process look just like reading and writing files. Early attempts to do it without threading met the argument that the required "inside out" model would be too hard for most module writers to handle. So, when Apache 2.0 came along with threading as a standard feature, there was much rejoicing. But wait! Unfortunately, even in 2.0, there are platforms that don't handle threading and process models that don't use it even if the platform supports it. So, we were back at square one. But, strangely, a new confidence in the ability of module writers meant that people suddenly believed that they could handle the "inside out" programming model.[6] And so, bucket brigades were born.

The general concept is that each "layer" in the filter stack can talk to the next layer up (or down, depending on whether it is an input filter or an output filter) and deal with the I/O between them by handing up (or down) "bucket brigades," which are a list of "buckets." Each bucket can contain some data, which should be dealt with in order by the filter, which, in turn, generates new bucket brigades and buckets.

Of course, there is an obvious asymmetry between input filters and output filters. Despite its obviousness, it takes a bit of getting used to when writing filters. An output filter is called with a bucket brigade and told "here, deal with the contents of this." In turn, it creates new bucket brigades and hands them on to the downstream filters. In contrast, an input filter gets asked "could you please fill this brigade?" and must, in turn, call lower-level filters to seed the input.

Of course, there are special cases for the ends of brigades — the "bottom" end will actually receive or send data (often through a special bucket) and the "top" end will consume or generate data without any higher (for output) or lower (for input) filter feeding it.

Why do we have buckets and bucket brigades? Why not pass buckets between the filters and dispense with brigades? The simple answer is that it is likely that filters will generate more than one bucket from time to time and would then have to store the "extra" ones until needed. Why make each one do that — why not have a standard mechanism? Once that's agreed, it is then natural to hand the brigade between layers instead of the buckets — it reduces the number of calls that have to be made without increasing complexity at all.

20.9.1 Bucket Interface

The bucket interface is documented in srclib/apr-util/include/apr_buckets.h.

Buckets come in various flavors — currently there are file, pipe, and socket buckets. There are buckets that are simply data in memory, but even these have various types — transient, heap, pool, memory-mapped, and immortal. There are also special EOS (end of stream) and flush buckets. Even though all buckets provide a way to read the bucket data (or as much as is currently available) via apr_bucket_read( ) — which is actually more like a peek interface — it is still necessary to consume the data somehow, either by destroying the bucket, reducing it in size, or splitting it. The read can be chosen to be either blocking or nonblocking — in either case, if data is available, it will all be returned.

Note that because the data is not destroyed by the read operation, it may be necessary for the bucket to change type and/or add extra buckets to the brigade — for example, consider a socket bucket: when you read it, it will read whatever is currently available from the socket and replace itself with a memory bucket containing that data. It will also add a new socket bucket following the memory bucket. (It can't simply insert the memory bucket before the socket bucket — that way, you'd have no way to find the pointer to the memory bucket, or even know it had been created.) So, although the current bucket pointer remains valid, it may change type as a result of a read, and the contents of the brigade may also change.

Although one cannot destructively read from a brigade, one can write to one — there are lots of functions to do that, ranging from apr_brigade_putc( ) to apr_brigade_printf( ).

EOS buckets indicate the end of the current stream (e.g., the end of a request), and flush buckets indicate that the filter should flush any stored data (assuming it can, of course). It is vital to obey such instructions (and pass them on), as failure will often cause deadlocks.

20.9.2 Output Filters

An output filter is given a bucket brigade, does whatever it does, and hands a new brigade (or brigades) down to the next filter in the output filter stack. To be used at all, a filter must first be registered. This is normally done in the hook registering function by calling ap_register_output_filter( ), like so:

ap_register_output_filter("filter name",filter_function,AP_FTYPE_RESOURCE);

where the first parameter is the name of the filter — this can be used in the configuration file to specify when a filter should be used. The second is the actual filter function, and the third says what type of filter it is (the possible types being AP_FTYPE_RESOURCE, AP_FTYPE_CONTENT_SET, AP_FTYPE_PROTOCOL, AP_FTYPE_TRANSCODE, AP_FTYPE_CONNECTION or AP_FTYPE_NETWORK). In reality, all the type does is determine where in the stack the filter appears. The filter function is called by the filter above it in the stack, which hands it its filter structure and a bucket brigade.

Once the filter is registered, it can be invoked either by configuration, or for more complex cases, the module can decide whether to insert it in the filter stack. If this is desired, the thing to do is to hook the "insert filter" hook, which is called when the filter stack is being set up. A typical hook would look like this:

ap_hook_insert_filter(filter_inserter,NULL,NULL,APR_HOOK_MIDDLE);

where filter_inserter( ) is a function that decides whether to insert the filter, and if so, inserts it. To do the insertion of the filter, you call:

ap_add_output_filter("filter name",ctx,r,r->connection);

where "filter name" is the same name as was used to register the filter in the first place and r is the request structure. The second parameter, ctx in this example, is an optional pointer to a context structure to be set in the filter structure. This can contain arbitrary information that the module needs the filter function to know in the usual way. The filter can retrieve it from the filter structure it is handed on each invocation:

static apr_status_t filter_function(ap_filter_t *f,apr_bucket_brigade *pbbIn)
    {
    filter_context *ctx=f->ctx;

where filter_context is a type you can choose freely (but had better match the type of the context variable you passed to ap_add_output_filter( )). The third and fourth parameters are the request and connection structures — the connection structure is always required, but the request structure is only needed if the filter applies to a single request rather than the whole connection.

As an example, I have written a complete output filter. This one is pretty frivolous — it simply converts the output to all uppercase. The current source should be available in modules/experimental/mod_case_filter.c. (Note that the comments to this example fall after the line(s) to which they refer.)

#include "httpd.h"
#include "http_config.h"
#include "apr_general.h"
#include "util_filter.h"
#include "apr_buckets.h"
#include "http_request.h"

First, we include the necessary headers.

static const char s_szCaseFilterName[]="CaseFilter";

Next, we declare the filter name — this registers the filter and later inserts it to declare it as a const string.

module case_filter_module;

This is simply a forward declaration of the module structure.

typedef struct
    {
    int bEnabled;
    } CaseFilterConfig;

The module allows us to enable or disable the filter in the server configuration — if it is disabled, it doesn't get inserted into the output filter chain. Here's the structure where we store that info.

static void *CaseFilterCreateServerConfig(apr_pool_t *p,server_rec *s)
    {
    CaseFilterConfig *pConfig=apr_pcalloc(p,sizeof *pConfig);

    pConfig->bEnabled=0;

    return pConfig;
    }

This creates the server configuration structure (note that this means it must be a per-server option, not a location-dependent one). All modules that need per-server configuration must do this.

static void CaseFilterInsertFilter(request_rec *r)
    {
    CaseFilterConfig *pConfig=ap_get_module_config(r->server->module_config,
                                                   &case_filter_module);

    if(!pConfig->bEnabled)
        return;

    ap_add_output_filter(s_szCaseFilterName,NULL,r,r->connection);
    }

This function inserts the output filter into the filter stack — note that it does this purely by the name of the filter. It is also possible to insert the filter automatically by using the AddOutputFilter or SetOutputFilter directives.

static apr_status_t CaseFilterOutFilter(ap_filter_t *f,
                                        apr_bucket_brigade *pbbIn)
    {
    apr_bucket *pbktIn;
    apr_bucket_brigade *pbbOut;

    pbbOut=apr_brigade_create(f->r->pool);

Since we are going to pass on data every time, we need to create a brigade to which to add the data.

    APR_BRIGADE_FOREACH(pbktIn,pbbIn)
        {

Now loop over each of the buckets passed into us.

        const char *data;
        apr_size_t len;
        char *buf;
        apr_size_t n;
        apr_bucket *pbktOut;

        if(APR_BUCKET_IS_EOS(pbktIn))
            {
            apr_bucket *pbktEOS=apr_bucket_eos_create( );
            APR_BRIGADE_INSERT_TAIL(pbbOut,pbktEOS);
            continue;
            }

If the bucket is an EOS, then pass it on down.

        apr_bucket_read(pbktIn,&data,&len,APR_BLOCK_READ);

Read all the data in the bucket, blocking to ensure there actually is some!

        buf=malloc(len);

Allocate a new buffer for the output data. (We need to do this because we may add another to the bucket brigade, so using a transient wouldn't do — it would get overwritten on the next loop.) However, we use a buffer on the heap rather than the pool so it can be released as soon as we're finished with it.

        for(n=0 ; n < len ; ++n)
            buf[n]=toupper(data[n]);

Convert whatever data we read into uppercase and store it in the new buffer.

        pbktOut=apr_bucket_heap_create(buf,len,0);

Create the new bucket, and add our data to it. The final 0 means "don't copy this, we've already allocated memory for it."

        APR_BRIGADE_INSERT_TAIL(pbbOut,pbktOut);

And add it to the tail of the output brigade.

        }

    return ap_pass_brigade(f->next,pbbOut);
    }

Once we've finished, pass the brigade down the filter chain.

static const char *CaseFilterEnable(cmd_parms *cmd, void *dummy, int arg)
    {
    CaseFilterConfig *pConfig=ap_get_module_config(cmd->server->module_config,
                                                   &case_filter_module);
    pConfig->bEnabled=arg;

    return NULL;
    }

This just sets the configuration option to enable or disable the filter.

static const command_rec CaseFilterCmds[] = 
    {
    AP_INIT_FLAG("CaseFilter", CaseFilterEnable, NULL, RSRC_CONF,
                 "Run a case filter on this host"),
    { NULL }
    };

And this creates the command to set it.

static void CaseFilterRegisterHooks(void)
    {
    ap_hook_insert_filter(CaseFilterInsertFilter,NULL,NULL,APR_HOOK_MIDDLE);

Every module must register its hooks, so this module registers the filter inserter hook.

    ap_register_output_filter(s_szCaseFilterName,CaseFilterOutFilter,
                              AP_FTYPE_CONTENT);

It is also a convenient (and correct) place to register the filter itself, so we do.

    }

module case_filter_module =
    {
    STANDARD20_MODULE_STUFF,
    NULL,
    NULL,
    CaseFilterCreateServerConfig,
    NULL,
    CaseFilterCmds,
    NULL,
    CaseFilterRegisterHooks
    };

Finally, we have to register the various functions in the module structure. And there we are: a simple output filter. There are two ways to invoke this filter, either add:

CaseFilter on

in a Directory or Location section, invoking it through its own directives, or (for example):

AddOutputFilter CaseFilter html

which associates it with all .html files using the standard filter directives.

20.9.3 Input Filters

An input filter is called when input is required. It is handed a brigade to fill, a mode parameter (the mode can either be blocking, nonblocking, or peek), and a number of bytes to read — 0 means "read a line." Most input filters will, of course, call the filter below them to get data, process it in some way, then fill the brigade with the resulting data.

As with output filters, the filter must be registered:

ap_register_input_filter("filter name", filter_function, AP_FTYPE_CONTENT);

where the parameters are as described earlier for output filters. Note that there is currently no attempt to avoid collisions in filter names, which is probably a mistake. As with output filters, you have to insert the filter at the right moment — all is the same as earlier, except the functions say "input" instead of "output," of course.

Naturally, input filters are similar to but not the same as output filters. It is probably simplest to illustrate the differences with an example. The following filter converts the case of request data (note, just the data, not the headers — so to see anything happen, you need to do a POST request). It should be available in modules/experimental/mod_case_filter_in.c. (Note the comments follow the line(s) of code to which they refer.)

#include "httpd.h"
#include "http_config.h"
#include "apr_general.h"
#include "util_filter.h"
#include "apr_buckets.h"
#include "http_request.h"

#include <ctype.h>

As always, we start with the headers we need.

static const char s_szCaseFilterName[]="CaseFilter";

And then we see the name of the filter. Note that this is the same as the example output filter — this is fine, because there's never an ambiguity between input and output filters.

module case_filter_in_module;

This is just the usual required forward declaration.

typedef struct
{
    int bEnabled;
} CaseFilterInConfig;

This is a structure to hold on to whether this filter is enabled or not.

typedef struct
{
    apr_bucket_brigade *pbbTmp;
} CaseFilterInContext;

Unlike the output filter, we need a context — this is to hold a temporary bucket brigade. We keep it in the context to avoid recreating it each time we are called, which would be inefficient.

static void *CaseFilterInCreateServerConfig(apr_pool_t *p,server_rec *s)
{
    CaseFilterInConfig *pConfig=apr_pcalloc(p,sizeof *pConfig);

    pConfig->bEnabled=0;

    return pConfig;
}

Here is just standard stuff creating the server config structure (note that ap_pcalloc( ) actually sets the whole structure to zeros anyway, so the explicit initialization of bEnabled is redundant, but useful for documentation purposes).

static void CaseFilterInInsertFilter(request_rec *r)
{
    CaseFilterInConfig *pConfig=ap_get_module_config(r->server->module_config,
                                                     &case_filter_in_module);
    CaseFilterInContext *pCtx;

    if(!pConfig->bEnabled)
        return;

If the filter is enabled (by the CaseFilterIn directive), then...

    pCtx=apr_palloc(r->pool,sizeof *pCtx);
    pCtx->pbbTmp=apr_brigade_create(r->pool);

Create the filter context discussed previously, and...

    ap_add_input_filter(s_szCaseFilterName,pCtx,r,NULL);

insert the filter. Note that because of where we're hooked, this happens after the request headers have been read.

}

Now we move on to the actual filter function.

static apr_status_t CaseFilterInFilter(ap_filter_t *f,
                                       apr_bucket_brigade *pbbOut,
                                       ap_input_mode_t eMode,
                                       apr_size_t *pnBytes)
{
    CaseFilterInContext *pCtx=f->ctx;

First we get the context we created earlier.

    apr_status_t ret;

    ap_assert(APR_BRIGADE_EMPTY(pCtx->pbbTmp));

Because we're reusing the temporary bucket brigade each time we are called, it's a good idea to ensure that it's empty — it should be impossible for it not to be, hence the use of an assertion instead of emptying it.

    ret=ap_get_brigade(f->next,pCtx->pbbTmp,eMode,pnBytes);

Get the next filter down to read some input, using the same parameters as we got, except it fills the temporary brigade instead of ours.

    if(eMode == AP_MODE_PEEK || ret != APR_SUCCESS)
        return ret;

If we are in peek mode, all we have to do is return success if there is data available. Since the next filter down has to do the same, and we only have data if it has, then we can simply return at this point. This may not be true for more complex filters, of course! Also, if there was an error in the next filter, we should return now regardless of mode.

    while(!APR_BRIGADE_EMPTY(pCtx->pbbTmp)) {

Now we loop over all the buckets read by the filter below.

        apr_bucket *pbktIn=APR_BRIGADE_FIRST(pCtx->pbbTmp);
        apr_bucket *pbktOut;
        const char *data;
        apr_size_t len;
        char *buf;
        int n;

        // It is tempting to do this...
        //APR_BUCKET_REMOVE(pB);
        //APR_BRIGADE_INSERT_TAIL(pbbOut,pB);
        // and change the case of the bucket data, but that would be wrong
        // for a file or socket buffer, for example...

As the comment says, the previous would be tempting. We could do a hybrid — move buckets that are allocated in memory and copy buckets that are external resources, for example. This would make the code considerably more complex, though it might be more efficient as a result.

        if(APR_BUCKET_IS_EOS(pbktIn)) {
            APR_BUCKET_REMOVE(pbktIn);
            APR_BRIGADE_INSERT_TAIL(pbbOut,pbktIn);
            continue;
        }

Once we've read an EOS, we should pass it on.

        ret=apr_bucket_read(pbktIn,&data,&len,eMode);
        if(ret != APR_SUCCESS)
            return ret;

Again, we read the bucket in the same mode in which we were called (which, at this point, is either blocking or nonblocking, but definitely not peek) to ensure that we don't block if we shouldn't, and do if we should.

        buf=malloc(len);
        for(n=0 ; n < len ; ++n)
            buf[n]=toupper(data[n]);

We allocate the new buffer on the heap, because it will be consumed and destroyed by the layers above us — if we used a pool buffer, it would last as long as the request does, which is likely to be wasteful of memory.

        pbktOut=apr_bucket_heap_create(buf,len,0,NULL);

As always, the bucket for the buffer needs to have a matching type (note that we could ask the bucket to copy the data onto the heap, but we don't).

        APR_BRIGADE_INSERT_TAIL(pbbOut,pbktOut);

Add the new bucket to the output brigade.

        apr_bucket_delete(pbktIn);

And delete the one we got from below.

    }

    return APR_SUCCESS;

If we get here, everything must have gone fine, so return success.

}

static const char *CaseFilterInEnable(cmd_parms *cmd, void *dummy, int arg)
{
    CaseFilterInConfig *pConfig
      =ap_get_module_config(cmd->server->module_config,&case_filter_in_module);
    pConfig->bEnabled=arg;

    return NULL;
}

This simply sets the Boolean enable flag in the configuration for this module. Note that we've used per-server configuration, but we could equally well use per-request, since the filter is added after the request is processed.

static const command_rec CaseFilterInCmds[] = 
{
    AP_INIT_FLAG("CaseFilterIn", CaseFilterInEnable, NULL, RSRC_CONF,
                 "Run an input case filter on this host"),

Associate the configuration command with the function that sets it.

    { NULL }
};


static void CaseFilterInRegisterHooks(apr_pool_t *p)
{
    ap_hook_insert_filter(CaseFilterInInsertFilter,NULL,NULL,APR_HOOK_MIDDLE);

Hook the filter insertion hook — this gets called after the request header has been processed, but before any response is written or request body is read.

    ap_register_input_filter(s_szCaseFilterName,CaseFilterInFilter,
                             AP_FTYPE_RESOURCE);

This is a convenient point to register the filter.

}

module case_filter_in_module =
{
    STANDARD20_MODULE_STUFF,
    NULL,
    NULL,
    CaseFilterInCreateServerConfig,
    NULL,
    CaseFilterInCmds,
    CaseFilterInRegisterHooks
};

Finally, we associate the various functions with the correct slots in the module structure. Incidentally, some people prefer to put the module structure at the beginning of the source — I prefer the end because it avoids having to predeclare all the functions used in it.

20.10 Modules

Almost everything in this chapter has been illustrated by a module implementing some kind of functionality. But how do modules fit into Apache? In fact, almost all of the work is done in the module itself, but a little extra is required outside. All that is required beyond that is to add it to the config.m4 file in its directory, which gets incorporated into the configure script. The lines for the two of the modules illustrated earlier are:

APACHE_MODULE(optional_fn_import, example optional function importer, , , no)
APACHE_MODULE(optional_fn_export, example optional function exporter, , , no)

The two modules can be enabled with the --enable-optional-fn-export and --enable-optional-fn-import flags to configure. Of course, the whole point is that you can enable either, both, or neither, and they will always work correctly.

The complete list of arguments for APACHE_MODULE( ) are:

APACHE_MODULE(name, helptext[, objects[, structname[, default[, config]]]])

where:

name

This is the name of the module, which normally matches the source filename (i.e., it is mod_name.c).

helptext

This is the text displayed when configure is run with --help as an argument.

objects

If this is present, it overrides the default object file of mod_name.o.

structname

The module structure is called name_module by default, but if this is present, it overrides it.

default

If present, this determines when the module is included. If set to yes, the module is always included unless explicitly disabled. If no, the module is never included unless explicitly enabled. If most, then it is not enabled unless --enable-most is specified. If absent or all, then it is only enabled when --enable-all is specified.

[1]  Fixing one tends to cause the other, naturally.

[2]  We'll admit to bias here — Ben designed and implemented the hooking mechanisms in Apache 2.0.

[3]  Note that the order is determined at runtime in Apache 2.0.

[4]  Dynamic Shared Objects — i.e., shared libraries, or DLLs in Windows parlance.

[5]  There is an argument that says it should be called before then, so it can be retrieved during hook registration, but the problem is that there is no "earlier" — that would require a hook!

[6]  So called because, instead of simply reading input and writing output, one must be prepared to receive some input, then return before a complete chunk is available, and then get called again with the next bit, possibly several times before anything completes. This requires saving state between each invocation and is considerably painful in comparison.

CONTENTS