Writing Apache Modules with Perl and C

Writing Apache Modules with Perl and C

By:	Lincoln Stein and Doug MacEachern
Published:	O'Reilly & Associates, Inc. - March 1999

Show Contents Previous Page Next Page

Chapter 10 - C API Reference Guide, Part I / Major Data Structures
The request_rec Structure

The request_rec request record is the heart and soul of the Apache API. It contains everything you could ever want to know about the current request and then some. You should already be intimately familiar with the request record from the Perl API. This section will show you what the request_rec looks like from within C.

The full definition of the request_rec is long and gnarly, combining public information that modules routinely use with private information that is only of interest to the core server (this includes such things as whether the request is using the "chunked" transfer mode implemented by HTTP/1.1). Example 10-1 gives the full definition of the request_rec, copied right out of include/httpd.h. We give detailed explanations for those fields that module writers need to worry about and silently ignore the rest.

ap_pool *pool

This is a resource pool that is valid for the lifetime of the request (ap_pool is merely a typedef alias for pool). Your request-time handlers should allocate memory from this pool.

conn_rec *connection

This is a pointer to the connection record for the current request, from which you can derive information about the local and remote host addresses, as well as the username used during authentication. for details, see "The conn_rec Structure" later in this chapter.

server_rec *server

This is a pointer to a server record server_rec structure, from which you can gather information about the current server. This is described in more detail in the next section, "The server_rec Structure."

request_rec *next
request_rec *prev
request_rec *main

Under various circumstances, including subrequests and internal redirects, Apache will generate one or more subrequests that are identical in all respects to an ordinary request. When this happens, these fields are used to chain the subrequests into a linked list. The next field points to the more recent request (or NULL, if there is none), and the prev field points to the immediate ancestor of the request. main points back to the top-level request. See Chapter 3 and Chapter 8, Customizing the Apache Configuration Process, for a more detailed discussion of the subrequest mechanism.

char *the_request

This contains the first line of the request, for logging purposes.

int proxyreq

If the current request is a proxy request, then this field will be set to a true (nonzero) value. Note that mod_proxy or mod_perl must be configured with the server for automatic proxy request detection. You can also set it yourself in order to activate Apache's proxy mechanism in the manner described in Chapter 7, Other Request Phases.

int header_only

This field will be true if the remote client made a head-only request (i.e., HEAD). You should not change the value of this field.

char *protocol

This field contains the name and version number of the protocol requested by the browser, for example HTTP/1.0.

time_t request_time

This is the time that the request started as a C time_t structure. See the manual page for gmtime for details on the time_t structure.

const char *hostname

This contains the name of the host requested by the client, either within the URI (during proxy requests) or in the Host header. The value of this field may not correspond to the canonical name of your server or the current virtual host but can be any of its DNS aliases. For this reason, it is better to use the ap_get_server_name() API function call described under "Processing Requests."

char *status_line

This field holds the full text of the status line returned from Apache to the remote browser, for example 200 OK. Ordinarily you will not want to change this directly but will allow Apache to set it based on the return value from your handler. However, you can change it directly in the rare instance that you want your handler to lie to Apache about its intentions (e.g., tell Apache that the handler processed the transaction OK, but send an error message to the browser).

int status

This field holds the numeric value of the transaction status code. Again you will usually not want to set this directly but allow Apache to do it for you.

char *method

This field contains the request method as a string, e.g., GET.

int method_number

This field contains the request method as an integer, e.g., M_GET. The appropriate symbolic constants are defined in include/httpd.h.

int allowed

This is a bit vector of request methods that your handler can accommodate. Ordinarily a content handler can just look at the value of method and return DECLINED if it doesn't want to handle it. However, to be fully friendly with the HTTP/1.1 protocol, handlers may also set allowed to the list of methods they accept. Apache will then generate an Allow: header which it transmits to any browser that's interested.
Here's a code fragment from a handler that accepts the GET and HEAD methods but not POST (or any of the more esoteric ones):

r->allowed = M_GET | M_HEAD;

long bytes_sent

This field contains the number of bytes that have been sent during the response phase and is used for logging. This count includes the document body but not the HTTP header fields.

time_t mtime

This field contains the modification time of the requested file, if any. The value may or may not be the same as the last modified time in the finfo stat buffer. The server core does not set this field; the task is left for modules to take care of. In general, this field should only be updated using the ap_update_mtime() function, described later in the section "Sending Files to the Client."

long length

This field holds the value of the outgoing Content-length header. You can read this value but should only change it using the ap_set_content_length() function, described later in the section "Sending Files to the Client."

long remaining

This field holds the value of the incoming Content-length header, if any. It is only set after a call to the ap_setup_client_block() function. After each call to ap_get_client_block(), the number of bytes read are subtracted from the remaining field.

table *headers_in
table *headers_out
table *err_headers_out
table *subprocess_env
table *notes

These are pointers to Apache table records which maintain information between the phases of a request and are disposed of once the request is finished. The tables are dynamic lists of name/value pairs whose contents can be accessed with the routines described later under "The Table API."
These five tables correspond to the like-named methods in the Perl API. headers_in and headers_out contain the incoming and outgoing HTTP headers. err_headers_out contains outgoing headers to be used in case of an error or a subrequest. subprocess_env contains name=value pairs to be copied into the environment prior to invoking subprocesses (such as CGI scripts). notes is a general table that can be used by modules to send "notes" from one phase to another.

const char *content_type
const char >*content_encoding

These fields contain the MIME content type and content encoding of the outgoing document. You can read this field to discover the MIME checking phase's best guess as to the document type or set it yourself within a content or MIME checking handler in order to change the type. The two fields frequently point to inline strings, so don't try to use strcpy() to modify them in place.

const char *handler

This is the symbolic name of the content handler that will service the request during the response phase. Handlers for earlier phases are free to modify this field in order to change the default behavior.

array_header *content_languages

This field holds an array_header pointer to the list of the language codes associated with this document. You can read and manipulate this list using the Apache array API (see "The Array API"). This array is usually set up during the MIME checking phase; however, the content handler is free to modify it.
The request_rec also contains a char * field named content_language. The header file indicates that this is for backward compatibility only and should not be used.

no_cache

If set to a true value, this field causes Apache to add an Expires field to the outgoing HTTP header with the same date and time as the incoming request. Browsers that honor this instruction will not cache the document locally.

char *unparsed_uri
char *uri
char *filename
char *path_info
char *args

These five fields all hold the requested URI after various processing steps have been performed. unparsed_uri is a character string holding the raw URI before any parsing has been performed. uri holds the path part of the URI, and is the one you will usually work with. filename contains the translated physical pathname of the requested document, as determined during the URI translation phase. path_info holds the additional path information that remains after the URI has been translated into a file path. Finally, args contains the query string for CGI GET requests, and corresponds to the portion of the URI following the ?. Unlike the Perl API, you will have to parse out the components of the query string yourself.
You can turn path_info into a physical path akin to the CGI scripts' PATH_ TRANSLATED environment variable by passing path_info to a subrequest and examining the filename field of the returned request record. See "The Subrequest API and Internal Redirects" later in this chapter.

uri_components parsed_uri

For finer access to the requested URI, Apache provides a uri_components data structure that contains the preparsed URI. This structure can be examined and manipulated with a special API. See "URI Parsing and Manipulation" in Chapter 11 for details.

struct stat finfo

This field is a stat struct containing the result of Apache's most recent stat() on the currently requested file (whose path you will find in filename). You can avoid an unnecessary system call by using the contents of this field directly rather than calling stat() again. If the requested file does not exist, finfo.st_mode will be set to zero.
In this example, we use the S_ISDIR macro defined in stat.h to detect whether the requested URI corresponds to a directory. Otherwise, we print out the file's modification time, using the ap_ht_time() function (described later) to format the time in standard HTTP format.

if(S_ISDIR(r->finfo.st_mode)) {
    ap_rprintf(r, "%s is a directory\n", r->filename);
}
else {
   ap_rprintf(r, "Last Modified: %s\n"
              ap_ht_time(r->pool, r->finfo.st_mtime, timefmt, 0));
}

void *per_dir_config
void *request_config

These fields are the entry points to lists of per-directory and per-request configuration data set up by your module's configuration routines. You should not try to manipulate these fields directly, but instead pass them to the configuration API routine ap_get_ module_config() described in the section "Accessing Module Configuration Data" in Chapter 11. Of the two, per_dir_config is the one you will use most often. request_config is used only rarely for passing custom configuration information to subrequests.

Example 10-1. The request_rec Structure (from include/httpd.h)

struct request_rec {
   ap_pool *pool;
   conn_rec *connection;
   server_rec *server;

    request_rec *next;         /* If we wind up getting redirected,
                               * pointer to the request we redirected to.
                               */
   request_rec *prev;         /* If this is an internal redirect,
                               * pointer to where we redirected *from*.
                               */

    request_rec *main;         /* If this is a sub_request (see request.h)
                               * pointer back to the main request.
                               */

    /* Info about the request itself... we begin with stuff that only
    * protocol.c should ever touch...
    */

    char *the_request;         /* First line of request, so we can log it */
   int assbackwards;          /* HTTP/0.9, "simple" request */
   int proxyreq;              /* A proxy request (calculated during
                               * post_read_request or translate_name) */
   int header_only;           /* HEAD request, as opposed to GET */
   char *protocol;            /* Protocol, as given to us, or HTTP/0.9 */
   int proto_num;             /* Number version of protocol; 1.1 = 1001 */
   const char *hostname;      /* Host, as set by full URI or Host: */

    time_t request_time;       /* When the request started */

    char *status_line;         /* Status line, if set by script */
   int status;                /* In any case */
   /* Request method, two ways; also, protocol, etc.  Outside of protocol.c,
    * look, but don't touch.
    */

    char *method;              /* GET, HEAD, POST, etc. */
   int method_number;         /* M_GET, M_POST, etc. */

    /*
      allowed is a bitvector of the allowed methods.

      A handler must ensure that the request method is one that
      it is capable of handling.  Generally modules should DECLINE
      any request methods they do not handle.  Prior to aborting the
      handler like this the handler should set r->allowed to the list
      of methods that it is willing to handle.  This bitvector is used
      to construct the "Allow:" header required for OPTIONS requests,
      and METHOD_NOT_ALLOWED and NOT_IMPLEMENTED status codes.

       Since the default_handler deals with OPTIONS, all modules can
      usually decline to deal with OPTIONS.  TRACE is always allowed,
      modules don't need to set it explicitly.

      Since the default_handler will always handle a GET, a
      module which does *not* implement GET should probably return
      METHOD_NOT_ALLOWED.  Unfortunately this means that a script GET
      handler can't be installed by mod_actions.
   */
   int allowed;               /* Allowed methods - for 405, OPTIONS, etc */

    int sent_bodyct;           /* byte count in stream is for body */
   long bytes_sent;           /* body byte count, for easy access */
   time_t mtime;              /* Time the resource was last modified */

    /* HTTP/1.1 connection-level features */

    int chunked;               /* sending chunked transfer-coding */
   int byterange;             /* number of byte ranges */
   char *boundary;            /* multipart/byteranges boundary */
   const char *range;         /* The Range: header */
   long clength;              /* The "real" content length */

    long remaining;            /* bytes left to read */
   long read_length;          /* bytes that have been read */
   int read_body;             /* how the request body should be read */
   int read_chunked;          /* reading chunked transfer-coding */

    /* MIME header environments, in and out.  Also, an array containing
    * environment variables to be passed to subprocesses, so people can
    * write modules to add to that environment.
    *
    * The difference between headers_out and err_headers_out is that the
    * latter are printed even on error and persist across internal redirects
    * (so the headers printed for ErrorDocument handlers will have them).
    *
    * The 'notes' table is for notes from one module to another, with no
    * other set purpose in mind...
    */

   table *headers_in;
   table *headers_out;
   table *err_headers_out;
   table *subprocess_env;
   table *notes;

    /* content_type, handler, content_encoding, content_language, and all
    * content_languages MUST be lowercased strings.  They may be pointers
    * to static strings; they should not be modified in place.
    */
   const char *content_type;  /* Break these out --- we dispatch on 'em */
   const char *handler;       /* What we *really* dispatch on           */
   const char *content_encoding;
   const char *content_language;      /* for back-compat. only -- do not use */ 
   array_header *content_languages;   /* array of (char*) */

    int no_cache;
   int no_local_copy;

    /* What object is being requested (either directly, or via include
    * or content-negotiation mapping).
    */

    char *unparsed_uri;        /* the uri without any parsing performed */
   char *uri;                 /* the path portion of the URI */
   char *filename;
   char *path_info;
   char *args;                /* QUERY_ARGS, if any */
   struct stat finfo;         /* ST_MODE set to zero if no such file */
   uri_components parsed_uri; /* components of uri, dismantled */

    /* Various other config info which may change with .htaccess files
    * These are config vectors, with one void* pointer for each module
    * (the thing pointed to being the module's business).
    */

    void *per_dir_config;      /* Options set in config files, etc. */
   void *request_config;      /* Notes on *this* request */

/*
* a linked list of the configuration directives in the .htaccess files
* accessed by this request.
* N.B. always add to the head of the list, _never_ to the end.
* that way, a sub request's list can (temporarily) point to a parent's list
*/
   const struct htaccess_result *htaccess;

/* Things placed at the end of the record to avoid breaking binary
* compatibility.  It would be nice to remember to reorder the entire
* record to improve 64-bit alignment the next time we need to break
* binary compatibility for some other reason.
*/
   unsigned expecting_100;     /* is client waiting for a 100 response? */
};

Show Contents Previous Page Next Page