Writing Apache Modules with Perl and C

Writing Apache Modules with Perl and C

By:	Lincoln Stein and Doug MacEachern
Published:	O'Reilly & Associates, Inc. - March 1999

Show Contents Previous Page Next Page

Chapter 9 - Perl API Reference Guide / The Apache Request Object
Client Request Methods

This section covers the request object methods that are used to query or modify the incoming client request. These methods allow you to retrieve such information as the URI the client has requested, the request method in use, the content of any submitted HTML forms, and various items of information about the remote host.

args()

The args() method returns the contents of the URI query string (that part of the request URI that follows the ? character, if any). When called in a scalar context, args() returns the entire string. When called in a list context, the method returns a list of parsed key/ value pairs:

my $query = $r->args;
my %in    = $r->args;

One trap to be wary of: if the same argument name is present several times (as can happen with a selection list in a fill-out form), assignment of args() to a hash will discard all but the last argument. To avoid this, you'll need to use the more complex argument processing scheme described in Chapter 4, Content Handlers.

connection()

This method returns an object blessed into the Apache::Connection class. See "The Apache::Connection Class" later in this chapter for information on what you can do with this object once you get it.

my $c = $r->connection;

content()

When the client request method is POST, which generally occurs when the remote client is submitting the contents of a fill-out form, the $r->content method returns the submitted information but only if the request content type is application/x-www-form-urlencoded . When called in a scalar context, the entire string is returned. When called in a list context, a list of parsed name=value pairs is returned.
To handle other types of PUT or POSTed content, you'll need to use a module such as CGI.pm or Apache::Request or use the read() method and parse the data yourself.

Note that you can only call content() once. If you call the method more than once, it will return undef (or an empty list) after the first try.

filename()

The filename() method sets or returns the result of the URI translation phase. During the URI translation phase, your handler will call this method with the physical path to a file in order to set the filename. During later phases of the transaction, calling this method with no arguments returns its current value.
Examples:

my $fname = $r->filename;
unless (open(FH, $fname)) {
   die "can't open $fname $!";
}
my $fname = do_translation($r->uri);
$r->filename($fname);

finfo()

Immediately following the translation phase, Apache walks along the components of the requested URI trying to determine where the physical file path ends and the additional path information begins (this is described at greater length at the beginning of Chapter 4). In the course of this walk, Apache makes the system stat() call one or more times to read the directory information along the path. When the walk is finished, the stat() information for the translated filename is cached in the request record, where it can be recovered using the finfo() method. If you need to stat() the file, you can take advantage of this cached stat structure rather than repeating the system call.
When finfo() is called, it moves the cached stat information into the special filehandle _ that Perl uses to cache its own stat operations. You can then perform file test operations directly on this filehandle rather than on the file itself, which would incur the penalty of another stat() system call. For convenience, finfo() returns a reference to the _ filehandle, so file tests can be done directly on the return value of finfo(). The following three examples all result with the same value for $size. However, the first two avoid the overhead of the implicit stat() performed by the last.

my $size = -s $r->finfo;

$r->finfo;
my $size = -s _;

my $size = -s $r->filename; # slower

It is possible for a module to be called upon to process a URL that does not correspond to a physical file. In this case, the stat() structure will contain the result of testing for a nonexistent file, and Perl's various file test operations will all return false.
The Apache::Util package contains a number of routines that are useful for manipulating the contents of the stat structure. For example, the ht_time() routine turns Unix timestamps into HTTP-compatible human readable strings. See the Apache::Util manpage and the section "The Apache::URI Class" later in this chapter for more details.

use Apache::Util qw(ht_time);

if(-d $r->finfo) {
  printf "%s is a directory\n", $r->filename;
}
else {
  printf "Last Modified: %s\n", ht_time((stat _)[9]);
}

get_client_block()
setup_client_block()
should_client_block()

The get_, setup_, and should_client_block methods are lower-level ways to read the data sent by the client in POST and PUT requests. This protocol exactly mirrors the C-language API described in Chapter 10, C API Reference Guide, Part I, and provides for timeouts and other niceties. Although the Perl API supports them, Perl programmers should generally use the simpler read() method instead.

get_remote_host()

This method can be used to look up the remote client's DNS hostname or simply return its IP address. When a DNS lookup is successful, its result is cached and returned on subsequent calls to get_remote_host() to avoid costly multiple lookups. This cached value can also be retrieved with the Apache::Connection object's remote_host() method.
This method takes an optional argument. The type of lookup performed by this method is affected by this argument, as well as the value of the Host-Name-Lookups directive. Possible arguments to this method, whose symbolic names can be imported from the Apache::Constants module using the :remotehost import tag, are the following:

REMOTE_HOST

If this argument is specified, Apache will try to look up the DNS name of the remote host. This lookup will fail if the Apache configuration directive Host-Name-Lookups is set to Off or if the hostname cannot be determined by a DNS lookup, in which case the function will return undef.

REMOTE_NAME

When called with this argument, the method will return the DNS name of the remote host if possible, or the dotted decimal representation of the client's IP address otherwise. This is the default lookup type when no argument is specified.

REMOTE_NOLOOKUP

When this argument is specified, get_remote_host() will not perform a new DNS lookup (even if the Host-Name-Lookups directive says so). If a successful lookup was done earlier in the request, the cached hostname will be returned. Otherwise, the method returns the dotted decimal representation of the client's IP address.

REMOTE_DOUBLE_REV

This argument will trigger a double-reverse DNS lookup regardless of the setting of the HostNameLookups directive. Apache will first call the DNS to return the hostname that maps to the IP number of the remote host. It will then make another call to map the returned hostname back to an IP address. If the returned IP address matches the original one, then the method returns the hostname. Otherwise, it returns undef. The reason for this baroque procedure is that standard DNS lookups are susceptible to DNS spoofing in which a remote machine temporarily assumes the apparent identity of a trusted host. Double-reverse DNS lookups make spoofing much harder and are recommended if you are using the hostname to distinguish between trusted clients and untrusted ones. However, double reverse DNS lookups are also twice as expensive.
In recent versions of Apache, double-reverse name lookups are always performed for the name-based access checking implemented by mod_access.
Here are some examples:

my $remote_host = $r->get_remote_host;
# same as above
use Apache::Constants qw(:remotehost);
my $remote_host = $r->get_remote_host(REMOTE_NAME);

# double-reverse DNS lookup
use Apache::Constants qw(:remotehost);
my $remote_host = $r->get_remote_host(REMOTE_DOUBLE_REV) || "nohost";

get_remote_logname()

This method returns the login name of the remote user or undef if the user's login could not be determined. Generally, this only works if the remote user is logged into a Unix or VMS host and that machine is running the identd daemon (which implements a protocol known as RFC 1413).
The success of the call also depends on the IdentityCheck configuration directive being turned on. Since identity checks can adversely impact Apache's performance, this directive is off by default.

my $remote_logname = $r->get_remote_logname;

headers_in()

When called in a list context, the headers_in() method returns a list of key/value pairs corresponding to the client request headers. When called in a scalar context, it returns a hash reference tied to the Apache::Table class. This class provides methods for manipulating several of Apache's internal key/value table structures and, for all intents and purposes, acts just like an ordinary hash table. However, it also provides object methods for dealing correctly with multivalued entries. See "The Apache::Table Class" later in this chapter for details.

my %headers_in = $r->headers_in;
my $headers_in = $r->headers_in;

Once you have copied the headers to a hash, you can refer to them by name. See Table 9-1 for a list of incoming headers that you may need to use. For example, you can view the length of the data that the client is sending by retrieving the key Content-length:

%headers_in = $r->headers_in;
my $cl = $headers_in{'Content-length'};

You'll need to be aware that browsers are not required to be consistent in their capitalization of header field names. For example, some may refer to Content-Type and others to Content-type. The Perl API copies the field names into the hash as is, and like any other Perl hash, the keys are case-sensitive. This is a potential trap.
For these reasons it's better to call headers_in() in a scalar context and use the returned tied hash. Since Apache::Table sits on top of the C table API, lookup comparisons are performed in a case-insensitive manner. The tied interface also allows you to add or change the value of a header field, in case you want to modify the request headers seen by handlers downstream. This code fragment shows the tied hash being used to get and set fields:

my $headers_in = $r->headers_in;
my $ct = $headers_in->{'Content-Length'};
$headers_in->{'User-Agent'} = 'Block this robot';

It is often convenient to refer to header fields without creating an intermediate hash or assigning a variable to the Apache::Table reference. This is the usual idiom:

my $cl = $r->headers_in->{'Content-Length'};

Certain request header fields such as Accept, Cookie, and several other request fields are multivalued. When you retrieve their values, they will be packed together into one long string separated by commas. You will need to parse the individual values out yourself. Individual values can include parameters which will be separated by semicolons. Cookies are common examples of this:

Set-Cookie: SESSION=1A91933A; domain=acme.com; expires=Wed, 21-Oct-1998 20:46:07 GMT

A few clients send headers with the same key on multiple lines. In this case, you can use the Apache::Table::get() method to retrieve all of the values at once.
For full details on the various incoming headers, see the documents at http://www.w3.org/Protocols. Nonstandard headers, such as those transmitted by experimental browsers, can also be retrieved with this method call.

Table 9-1. Incoming HTTP Request Headers

Field	Description
Accept	MIME types that the client accepts
Accept-encoding	Compression methods that the client accepts
Accept-language	Languages that the client accepts
Authorization	Used by various authorization/authentication schemes
Connection	Connection options, such as Keep-alive
Content-length	Length, in bytes, of data to follow
Content-type	MIME type of data to follow
Cookie	Client-side data
From	Email address of the requesting user (deprecated)
Host	Virtual host to retrieve data from
If-modified-since	Return document only if modified since the date specified
If-none-match	Return document if it has changed
Referer	URL of document that linked to the requested one
User-agent	Name and version of the client software

header_in()

The header_in() method (singular, not plural) is used to get or set the value of a client incoming request field. If the given value is undef, the header will be removed from the list of header fields:

my $cl = $r->header_in('Content-length');
$r->header_in($key, $val); #set the value of header '$key'
$r->header_in('Content-length' => undef); #remove the header

The key lookup is done in a case-insensitive manner. The header_in() method predates the Apache::Table class but remains for backward compatibility and as a bit of a shortcut to using the headers_in() method.

header_only()

If the client issues a HEAD request, it wants to receive the HTTP response headers only. Content handlers should check for this by calling header_only() before generating the document body. The method will return true in the case of a HEAD request and false in the case of other requests. Alternatively, you could examine the string value returned by method() directly, although this would be less portable if the HTTP protocol were some day expanded to support more than one header-only request method.

# generate the header & send it
$r->send_http_header;
return OK if $r->header_only;
# now generate the document...

Do not try to check numeric value returned by method_number() to identify a header request. Internally, Apache uses the M_GET number for both HEAD and GET methods.

method()

This method will return the string version of the request method, such as GET, HEAD, or POST. Passing an argument will change the method, which is occasionally useful for internal redirects (Chapter 4) and for testing authorization restriction masks (Chapter 6, Authentication and Authorization).

my $method = $r->method;
$r->method('GET');

If you update the method, you probably want to update the method number accordingly as well.

method_number()

This method will return the request method number, which refers to internal constants defined by the Apache API. The method numbers are available to Perl programmers from the Apache::Constants module by importing the :methods set. The relevant constants include M_GET, M_POST, M_PUT, and M_DELETE. Passing an argument will set this value, mainly used for internal redirects and for testing authorization restriction masks. If you update the method number, you probably want to update the method accordingly as well.
Note that there isn't an M_HEAD constant. This is because when Apache receives a HEAD request, it sets the method number to M_GET and sets header_only() to return true.

use Apache::Constants qw(:methods);
if ($r->method_number == M_POST) {
  # change the request method
  $r->method_number(M_GET);
  $r->method("GET");
  $r->internal_redirect('/new/place');
}

There is no particular advantage of using method_number() over method() for Perl programmers, other than being only slightly more efficient.

parsed_uri()

When Apache parses the incoming request, it will turn the request URI into a predigested uri_components structure. The parsed_uri() method will return an object blessed into the Apache::URI class, which provides methods for fetching and setting various parts of the URI. See "The Apache::Util Class" later in this chapter for details.

use Apache::URI ();
my $uri = $r->parsed_uri;
my $host = $uri->hostname;

path_info()

The path_info() method will return what is left in the path after the URI translation phase. Apache's default translation method, described at the beginning of Chapter 4, uses a simple directory-walking algorithm to decide what part of the URI is the file and what part is the additional path information.
You can provide an argument to path_info() in order to change its value:

my $path_info = $r->path_info;
$r->path_info("/some/additional/information");

Note that in most cases, changing the path_info() requires you to sync the uri() with the update. In the following example, we calculate the original URI minus any path info, change the existing path info, then properly update the URI:

my $path_info = $r->path_info;
my $uri = $r->uri;
my $orig_uri = substr $uri, 0, length($uri) - length($path_info);
$r->path_info($new_path_info);
$r->uri($orig_uri . $r->path_info);

protocol

The $r->protocol method will return a string identifying the protocol that the client speaks. Typical values will be HTTP/1.0 or HTTP/1.1.

my $protocol= $r->protocol;

This method is read-only.

proxyreq()

The proxyreq() method returns true if the current HTTP request is for a proxy URI-- that is, if the actual document resides on a foreign server somewhere and the client wishes Apache to fetch the document on its behalf. This method is mainly intended for use during the filename translation phase of the request.

sub handler {
  my $r = shift;
  return DECLINED unless $r->proxyreq;
  # do something interesting...
}

See Chapter 7 for examples.

read()

The read() method provides Perl API programmers with a simple way to get at the data submitted by the browser in POST and PUT requests. It should be used when the information submitted by the browser is not in the application/x-www-form-urlencoded format that the content() method knows how to handle.
Call read() with a scalar variable to hold the read data and the length of the data to read. Generally, you will want to ask for the entire data sent by the client, which can be recovered from the incoming Content-length field:¹

my $buff;
$r->read($buff, $r->header_in('Content-length'));

Internally, Perl sets up a timeout in case the client breaks the connection prematurely. The exact value of the timeout is set by the Timeout directive in the server configuration file. If a timeout does occur, the script will be aborted.

Within a handler you may also recover client data by simply reading from STDIN using Perl's read(), getc(), and readline (<>) functions. This works because the Perl API ties STDIN to Apache::read() before entering handlers.

server()

This method returns a reference to an Apache::Server object, from which you can retrieve all sorts of information about low-level aspects of the server's configuration. See "The Apache::Server Class" for details.

my $s = $r->server;

the_request()

This method returns the unparsed request line sent by the client. the_request() is primarily used by log handlers, since other handlers will find it more convenient to use methods that return the information in preparsed form. This method is read-only.

my $request_line = $r->the_request;
print LOGFILE $request_line;

Note that the_request() is functionally equivalent to this code fragment:

my $request_line = join ' ', $r->method, $r->uri, $r->protocol;

uri()

The uri() method returns the URI requested by the browser. You may also pass this method a string argument in order to set the URI seen by handlers further down the line, which is something that a translation handler might want to do.

my $uri = $r->uri;
$r->uri("/something/else");

Show Contents Previous Page Next Page