Writing Apache Modules with Perl and C

Writing Apache Modules with Perl and C

By:	Lincoln Stein and Doug MacEachern
Published:	O'Reilly & Associates, Inc. - March 1999

Show Contents Previous Page Next Page

Chapter 9 - Perl API Reference Guide / Other Core Perl API Classes
The Apache::URI Class

Apache Version 1.3 introduced a utility module for parsing URIs, manipulating their contents, and unparsing them back into string form. Since this functionality is part of the server C API, Apache::URI offers a lightweight alternative to the URI::URL module that ships with the libwww-perl package.⁵

An Apache::URI object is returned when you call the request object's parsed_uri() method. You may also call the Apache::URI parse() constructor to parse an arbitrary string and return a new Apache::URI object, for example:

use Apache::URI ();
my $parsed_uri = $r->parsed_uri;

fragment()

This method returns or sets the fragment component of the URI. You know this as the part that follows the hash mark (#) in links. The fragment component is generally used only by clients and some web proxies.

my $fragment = $uri->fragment;
$uri->fragment('section_1');

hostinfo()

This method gets or sets the remote host information, which usually consists of a hostname and port number in the format :. Some rare URIs, such as those used for nonanonymous FTP, attach a username and password to this information, for use in accessing private resources. In this case, the information returned is in the format :@:.
This method returns the host information when called without arguments, or sets the information when called with a single string argument.

my $hostinfo = $uri->hostinfo;
$uri->hostinfo('www.modperl.com:8000');

hostname()

This method returns or sets the hostname component of the URI object.

my $hostname = $uri->hostname;
$uri->hostname('www.modperl.com');

parse()

The parse() method is a constructor used to create a new Apache::URI object from a URI string. Its first argument is an Apache request object, and the second is a string containing an absolute or relative URI. In the case of a relative URI, the parse() method uses the request object to determine the location of the current request and resolve the relative URI.

my $uri = Apache::URI->parse($r, 'http://www.modperl.com/');

If the URI argument is omitted, the parse() method will construct a fully qualified URI from $r, including the scheme, hostname, port, path, and query string.

my $self_uri = Apache::URI->parse($r);

password()

This method gets or sets the password part of the hostinfo component.

my $password = $uri->password;
$uri->password('rubble');

path()

This method returns or sets the path component of the URI object.

my $path = $uri->path;
$uri->path('/perl/hangman.pl');

path_info()

After the "real path" part of the URI comes the "additional path information." This component of the URI is not defined by the official URI RFC, because it is an internal concept from web servers that need to do something with the part of the path information that is left over from translating the path into a valid filename.
path_info() gets or sets the additional path information portion of the URI, using the current request object to determine what part of the path is real and what part is additional.

$uri->path_info('/foo/bar');

port()

This method returns or sets the port component of the URI object.

my $port = $uri->port;
$uri->port(80);

query()

This method gets or sets the query string component of the URI; in other words, the part after the ?.

my $query = $uri->query;
$uri->query('one+two+three');

rpath()

This method returns the "real path;" that is, the path() minus the path_info().

my $path = $uri->rpath();

scheme()

This method returns or sets the scheme component of the URI. This is the part that identifies the URI's protocol, such as http or ftp. Called without arguments, the current scheme is retrieved. Called with a single string argument, the current scheme is set.

my $scheme = $uri->scheme;
$uri->scheme('http');

unparse()

This method returns the string representation of the URI. Relative URIs are resolved into absolute ones.

my $string = $uri->unparse;

Beware that the unparse() method does not take the additional path information into account. It returns the URI minus the additional information.

user()

This method gets or sets the username part of the hostinfo component.

my $user = $uri->user;
$uri->user('barney');

The Apache::Util Class

Show Contents Go to Top Previous Page Next Page

The Apache API provides several utility functions that are used by various standard modules. The Perl API makes these available as function calls in the Apache::Util package.

Although there is nothing here that doesn't already exist in some existing Perl module, these C versions are considerably faster than their corresponding Perl functions and avoid the memory bloat of pulling in yet another Perl package.

To make these functions available to your handlers, import the Apache::Util module with an import tag of :all:

use Apache::Util qw(:all);

escape_uri()

This function encodes all unsafe characters in a URI into %XX hex escape sequences. This is equivalent to the URI::Escape::uri_escape() function from the LWP package.

use Apache::Util qw(escape_uri);
my $escaped = escape_uri($url);

escape_html()

This function replaces unsafe HTML character sequences (<, >, and &) with their entity representations. This is equivalent to the HTML::Entities::encode() function.

use Apache::Util qw(escape_html);
my $display_html = escape_html("<h1>Header Level 1 Example</h1>");

ht_time()

This function produces dates in the format required by the HTTP protocol. You will usually call it with a single argument, the number of seconds since the epoch. The current time expressed in these units is returned by the Perl built-in time() function.
You may also call ht_time() with optional second and third arguments. The second argument, if present, is a format string that follows the same conventions as the strftime() function in the POSIX library. The default format is %a, %d %b %Y%H:%M:%S %Z, where %Z is an Apache extension that always expands to GMT. The optional third argument is a flag that selects whether to express the returned time in GMT (Greenwich Mean Time) or the local time zone. A true value (the default) selects GMT, which is what you will want in nearly all cases.
Unless you have a good reason to use a nonstandard time format, you should content yourself with the one-argument form of this function. The function is equivalent to the LWP package's HTTP::Date::time2str() function when passed a single argument.

use Apache::Util qw(ht_time);
my $str = ht_time(time);
my $str = ht_time(time, "%d %b %Y %H:%M %Z");    # 06 Nov 1994 08:49 GMT
my $str = ht_time(time, "%d %b %Y %H:%M %Z",0);  # 06 Nov 1994 13:49 EST

parsedate()

This function is the inverse of ht_time(), parsing HTTP dates and returning the number of seconds since the epoch. You can then pass this value to Time::localtime (or another of Perl's date-handling modules) and extract the date fields that you want.
The parsedate() recognizes and handles date strings in any of three standard formats:

Sun, 06 Nov 1994 08:49:37 GMT   ; RFC 822, the modern HTTP format
Sunday, 06-Nov-94 08:49:37 GMT  ; RFC 850, the old obsolete HTTP format
Sun Nov  6 08:49:37 1994        ; ANSI C's asctime() format

Here is an example:

use Apache::Util qw(parsedate);
my $secs;
if (my $if_modified = $r->headers_in->{'If-modified-since'}) {
  $secs = parsedate $if_modified;
}

size_string()

This function converts the given file size into a formatted string. The size given in the string will be in units of bytes, kilobytes, or megabytes, depending on the size of the file. This function formats the string just as the C ap_send_size() API function does but returns the string rather than sending it directly to the client. The ap_send_size() function is used in mod_autoindex to display the size of files in automatic directory listings and by mod_include to implement the fsize directive.
This example uses size_string() to get the formatted size of the currently requested file:

use Apache::Util qw(size_string);
my $size = size_string -s $r->finfo;

unescape_uri()

This function decodes all %XX hex escape sequences in the given URI. It is equivalent to the URI::Escape::uri_unescape() function from the LWP package.

use Apache::Util qw(unescape_uri);
my $unescaped = unescape_uri($safe_url);

unescape_uri_info()

This function is similar to unescape_uri() but is specialized to remove escape sequences from the query string portion of the URI. The main difference is that it translates the + character into spaces as well as recognizes and translates the hex escapes.

use Apache::Util qw(unescape_info);
$string = $r->uri->query;
my %data = map { unescape_uri_info($_) } split /[=&]/, $string, -1;

This would correctly translate the query string name=Fred+Flint-stone&town=Bedrock into the following hash:

data => 'Fred Flintstone',
town => 'Bedrock'

Show Contents Go to Top Previous Page Next Page