Writing Apache Modules with Perl and C

Writing Apache Modules with Perl and C

By:	Lincoln Stein and Doug MacEachern
Published:	O'Reilly & Associates, Inc. - March 1999

Show Contents Previous Page Next Page

Chapter 9 - Perl API Reference Guide / The Apache Request Object
Server Core Functions

This section covers the API methods that are available for your use during the processing of a request but are not directly related to incoming or outgoing data.

chdir_file()

Given a filename as argument, change from the current directory to the directory in which the file is contained. This is a convenience routine for modules that implement scripting engines, since it is common to run the script from the directory in which it lives. The current directory will remain here, unless your module changes back to the previous directory. As there is significant overhead associated with determining the current directory, we suggest using the $Apache::Server::CWD variable or the server_root_ relative() method if you wish to return to the previous directory afterward.

$r->chdir_file($r->filename);

child_terminate()

Calling this method will cause the current child process to shutdown gracefully after the current transaction is completed and the logging and cleanup phases are done. This method is not available on Win32 systems.

$r->child_terminate;

hard_timeout()
kill_timeout()
reset_timeout()
soft_timeout()

The timeout API governs the interaction of Apache with the client. At various points during the request/response cycle, a browser that is no longer responding can be timed out so that it doesn't continue to hold the connection open. Timeouts are primarily of concern to C API programmers, as mod_perl handles the details of timeouts internally for read and write methods. However, these calls are included in the Perl API for completeness.
The hard_timeout() method initiates a "hard" timeout. If the client read or write operation takes longer than the time specified by Apache's Timeout directive, then the current handler will be aborted immediately and Apache will immediately enter the logging phase. hard_timeout() takes a single string argument which should contain the name of your module or some other identification. This identification will be incorporated into the error message that is written to the server error log when the timeout occurs.
soft_timeout(), in contrast, does not immediately abort the current handler. Instead, when a timeout occurs control returns to the handler, but all read and write operations are replaced with no-ops so that no further data can be sent or received to the client. In addition, the Apache::Connection object's aborted() method will return true. Like hard_ timeout(), you should pass this method the name of your module in order to be able to identify the source of the timeout in the error log.
The reset_timeout() method can be called to set a previously initiated timer back to zero. It is usually used between a series of read or write operations in order to restart the timer.
Finally, the kill_timeout() method is called to cancel a previously initiated time-out. It is generally called when a series of I/O operations are completely done.
The following examples will give you the general idea of how these four methods are used. Remember, however, that in the Perl API these methods are not really necessary because they are called internally by the read() and print() methods.

# typical hard_timeout() usage
$r->hard_timeout("Apache::Example while reading data");
while (... read data loop ...) {
  ...
  $r->reset_timeout;
}
$r->kill_timeout;

# typical soft_timeout() usage
$r->soft_timeout("Apache::Example while reading data");
while (... read data loop ...) {
  ...
  $r->reset_timeout;
}
$r->kill_timeout;

internal_redirect()

Unlike a full HTTP redirect in which the server tells the browser to look somewhere else for the requested document, the internal_redirect() method tells Apache to return a different URI without telling the client. This is a lot faster than a full redirect.
The required argument is an absolute URI path on the current server. The server will process the URI as if it were a whole new request, running the URI translation, MIME type checking, and other phases before invoking the appropriate content handler for the new URI. The content handler that eventually runs is not necessarily the same as the one that invoked internal_redirect(). This method should only be called within a content handler.
Do not use internal_redirect() to redirect to a different server. You'll need to do a full redirect for that. Both redirection techniques are described in more detail in Chapter 4.

$r->internal_redirect("/new/place");

Apache implements its ErrorDocument feature as an internal redirect, so many of the techniques that apply to internal redirects also apply to custom error handling.

internal_redirect_handler()

This method does the same thing as internal_redirect() but arranges for the content handler used to process the redirected URI to be the same as the current content handler.

$r->internal_redirect_handler("/new/place");

is_initial_req()

There are several instances in which an incoming URI request can trigger one or more secondary internal requests. An internal request is triggered when internal_redirect() is called explicitly, and it also happens behind the scenes when lookup_file() and lookup_uri() are called.
With the exception of the logging phase, which is run just once for the primary request, secondary requests are run through each of the transaction processing phases, and the appropriate handlers are called each time. There may be times when you don't want a particular handler running on a subrequest or internal redirect, either to avoid performance overhead or to avoid infinite recursion. The is_initial_req() method will return a true value if the current request is the primary one and false if the request is the result of a subrequest or an internal redirect.

return DECLINED unless $r->is_initial_req;

is_main()

This method can be used to distinguish between subrequests triggered by handlers and the "main" request triggered by a browser's request for a URI or an internal redirect. is_ main() returns a true value for the primary request and for internal redirects and false for subrequests. Notice that this is slightly different from is_initial_req(), which returns false for internal redirects as well as subrequests.
is_main() is commonly used to prevent infinite recursion when a handler gets reinvoked after it has made a subrequest.

return DECLINED unless $r->is_main;

Like is_initial_req(), this is a read-only method.

last()
main()
next()
prev()

When a handler is called in response to a series of internal redirects, ErrorDocuments, or subrequests, it is passed an ordinary-looking request object and can usually proceed as if it were processing a normal request. However, if a module has special needs, it can use these methods to walk the chain to examine the request objects passed to other requests in the series.
main() will return the request object of the parent request, the top of the chain. last() will return the last request in the chain. prev() and next() will return the previous and next requests in the chain, respectively. Each of these methods will return a reference to an object belonging to the Apache class or undef if the request doesn't exist.
The prev() method is handy inside an ErrorDocument handler to get at the information from the request that triggered the error. For example, this code fragment will find the URI of the failed request:

my $failed_uri = $r->prev->uri;

The last() method is mainly used by logging modules. Since Apache may have performed several subrequests while attempting to resolve the request, the last object will always point to the final result.

my $bytes_sent = $r->last->bytes_sent;

Should your module wish to log all internal requests, the next() method will come in handy.

sub My::logger {
   my $r = shift;
   my $first = $r->uri;
   my $last = $r->last->uri;
   warn "first: $first, last: $last\n";
   for (my $rr = $r; $rr; $rr = $rr->next) {
       my $uri = $rr->uri;
       my $status = $rr->status;
       warn "request: $uri, status: $status\n";
   }
   return OK;
}

Assuming the requested URI was /, which was mapped to /index.html by the DirectoryIndex configuration, the example above would output these messages to the ErrorLog:

first: /, last: /index.html
request: /, status: 200
request: /index.html, status: 200

The next() and main() methods are rarely used, but they are included for completeness. Handlers that need to determine whether they are in the main request should call $r->is_main() rather than !$r->main(), as the former is marginally more efficient.

location()

If the current handler was triggered by a Perl*Handler directive within a <Location> section, this method will return the path indicated by the <Location> directive.
For example, given this <Location> section:

<Location /images/dynamic_icons>
  SetHandler  perl-script
  PerlHandler Apache::Icon
</Location>

location() will return /images/dynamic_icons.
This method is handy for converting the current document's URI into a relative path.

my $base = $r->location;
(my $relative = $r->uri) =~ s/^$base//;

lookup_file()
lookup_uri()

lookup_file() and lookup_uri() invoke Apache subrequests. A subrequest is treated exactly like an ordinary request, except that the post read request, header parser, response generation, and logging phases are not run. This allows modules to pose "what-if" questions to the server. Subrequests can be used to learn the MIME type mapping of an arbitrary file, map a URI to a filename, or find out whether a file is under access control. After a successful lookup, the response phase of the request can optionally be invoked.
Both methods take a single argument corresponding to an absolute filename or a URI path, respectively. lookup_uri() performs the URI translation on the provided URI, passing the request to the access control and authorization handlers, if any, and then proceeds to the MIME type checking phase. lookup_file() behaves similarly but bypasses the initial URI translation phase and treats its argument as a physical file path.
Both methods return an Apache::SubRequest object, which is identical for all intents and purposes to a plain old Apache request object, as it inherits all methods from the Apache class. You can call the returned object's content_type(), filename(), and other methods to retrieve the information left there during subrequest processing.
The subrequest mechanism is extremely useful, and there are many practical examples of using it in Chapters 4, 5, and 6. The following code snippets show how to use subrequests to look up first the MIME type of a file and then a URI:

my $subr = $r->lookup_file('/home/http/htdocs/images/logo.tif');
my $ct = $subr->content_type;

my $ct = $r->lookup_uri('/images/logo.tif')->content_type;

In the lookup_uri() example, /images/logo.tif will be passed through the same series of Alias, ServerRoot, and URI rewriting translations that the URI would be subjected to if it were requested by a browser.
If you need to pass certain HTTP header fields to the subrequest, such as a particular value of Accept, you can do so by calling headers_in() before invoking lookup_uri() or lookup_file().
It is often a good idea to check the status of a subrequest in case something went wrong. If the subrequest was successful, the status value will be that of HTTP_OK.

use Apache::Constants qw(:common HTTP_OK);
my $subr = $r->lookup_uri("/path/file.html");
my $status = $subr->status;

unless ($status == HTTP_OK) {
   die "subrequest failed with status: $status";
}

notes()

There are times when handlers need to communicate among themselves in a way that goes beyond setting the values of HTTP header fields. To accommodate this, Apache maintains a "notes" table in the request record. This table is simply a list of key/value pairs. One handler can add its own key/value entry to the notes table, and later the handler for a subsequent phase can retrieve the note. Notes are maintained for the life of the current request and are deleted when the transaction is finished.
When called with two arguments, this method sets a note. When called with a single argument, it retrieves the value of that note. Both the keys and the values must be simple strings.

$r->notes('CALENDAR' => 'Julian');
my $cal = $r->notes('CALENDAR');

When called in a scalar context with no arguments, a hash reference tied to the Apache::Table class will be returned.

my $notes = $r->notes;
my $cal = $notes->{CALENDAR};

This method comes in handy for communication between a module written in Perl and one written in C. For example, the logging API saves error messages under a key named error-notes, which could be used by ErrorDocuments to provide a more informative error message.

The LogFormat directive, part of the standard mod_log_config module, can incorporate notes into log messages using the formatting character %n. See the Apache documentation for details.

subprocess_env()

The subprocess_env() method is used to examine and change the Apache environment table. Like other table-manipulation functions, this method has a variety of behaviors depending on the number of arguments it is called with and the context in which it is called. Call the method with no arguments in a scalar context to return a hash reference tied to the Apache::Table class:

my $env = $r->subprocess_env;
my $docroot = $env->{'DOCUMENT_ROOT'};

Call the method with a single argument to retrieve the current value of the corresponding entry in the environment table, or undef if no entry by that name exists:

my $doc_root = $r->subprocess_env("DOCUMENT_ROOT");

You may also call the method with a key/value pair to set the value of an entry in the table:

$r->subprocess_env(DOOR => "open");

Finally, if you call subprocess_env() in a void context with no arguments, it will reinitialize the table to contain the standard variables that Apache adds to the environment before invoking CGI scripts and server-side include files:

$r->subprocess_env;

Changes made to the environment table only persist for the length of the request. The table is cleared out and reinitialized at the beginning of every new transaction.
In the Perl API, the primary use for this method is to set environment variables for other modules to see and use. For example, a fixup handler could use this call to set up environment variables that are later recognized by mod_include and incorporated into server-side include pages. You do not ordinarily need to call subprocess_env() to read environment variables because mod_perl automatically copies the environment table into the Perl %ENV array before entering the response handler phase.
A potential confusion arises when a Perl API handler needs to launch a subprocess itself using system(), backticks, or a piped open. If you need to pass environment variables to the subprocess, set the appropriate keys in %ENV just as you would in an ordinary Perl script. subprocess_env() is only required if you need to change the environment in a subprocess launched by a different handler or module.

register_cleanup()

The register_cleanup() method registers a subroutine that will be called after the logging stage of a request. This is much the same as installing a cleanup handler with the PerlCleanupHandler directive. See Chapter 7 for some practical examples of using register_ cleanup().
The method expects a code reference argument:

sub callback {
   my $r = shift;
   my $uri = $r->uri;
   warn "process $$ all done with $uri\n";
}
$r->register_cleanup(\&callback);

Show Contents Previous Page Next Page