home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Practical mod_perlPractical mod_perlSearch this book

16.4. HTTP Requests

Section 13.11 of the specification states that the only two cacheable methods are GET and HEAD. Responses to POST requests are not cacheable, as you'll see in a moment.

16.4.1. GET Requests

Most mod_perl programs are written to service GET requests. The server passes the request to the mod_perl code, which composes and sends back the headers and the content body.

But there is a certain situation that needs a workaround to achieve better cacheability. We need to deal with the "?" in the relative path part of the requested URI. Section 13.9 specifies that:

... caches MUST NOT treat responses to such URIs as fresh unless the server provides 
an explicit expiration time.  This specifically means that responses from HTTP/1.0 
servers for such URIs SHOULD NOT be taken from a cache.

Although it is tempting to imagine that if we are using HTTP/1.1 and send an explicit expiration time we are safe, the reality is unfortunately somewhat different. It has been common for quite a long time to misconfigure cache servers so that they treat all GET requests containing a question mark as uncacheable. People even used to mark anything that contained the string "cgi-bin" as uncacheable.

To work around this bug in HEAD requests, we have stopped calling CGI directories cgi-bin and we have written the following handler, which lets us work with CGI-like query strings without rewriting the software (e.g., Apache::Request and CGI.pm) that deals with them:

sub handler {
    my $r = shift;
    my $uri = $r->uri;
    if ( my($u1,$u2) = $uri =~ / ^ ([^?]+?) ; ([^?]*) $ /x ) {
        $r->uri($u1);
        $r->args($u2);
    }
    elsif ( my ($u1,$u2) = $uri =~ m/^(.*?)%3[Bb](.*)$/ ) {
        # protect against old proxies that escape volens nolens
        # (see HTTP standard section 5.1.2)
        $r->uri($u1);
        $u2 =~ s/%3[Bb]/;/g;
        $u2 =~ s/%26/;/g; # &
        $u2 =~ s/%3[Dd]/=/g;
        $r->args($u2);
    }
    DECLINED;
}

This handler must be installed as a PerlPostReadRequestHandler.

The handler takes any request that contains one or more semicolons but no question mark and changes it so that the first semicolon is interpreted as a question mark and everything after that as the query string. So now we can replace the request:

http://example.com/query?BGCOLOR=blue;FGCOLOR=red

with:

http://example.com/query;BGCOLOR=blue;FGCOLOR=red

This allows the coexistence of queries from ordinary forms that are being processed by a browser alongside predefined requests for the same resource. It has one minor bug: Apache doesn't allow percent-escaped slashes in such a query string. So instead of:

http://example.com/query;BGCOLOR=blue;FGCOLOR=red;FONT=%2Ffont%2Fpath

we must use:

http://example.com/query;BGCOLOR=blue;FGCOLOR=red;FONT=/font/path

To unescape the escaped characters, use the following code:

s/%([0-9A-Fa-f]{2})/chr hex $1/ge;

16.4.2. Conditional GET Requests

A rather challenging request that may be received is the conditional GET, which typically means a request with an If-Modified-Since header. The HTTP specification has this to say:

The semantics of the GET method change to a "conditional GET" if the request message 
includes an If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match, or If-
Range header field.  A conditional GET method requests that the entity be transferred 
only under the circumstances described by the conditional header field(s). The 
conditional GET method is intended to reduce unnecessary network usage by allowing
cached entities to be refreshed without requiring multiple requests or transferring 
data already held by the client.

So how can we reduce the unnecessary network usage in such a case? mod_perl makes it easy by providing access to Apache's meets_conditions( ) function (which lives in Apache::File). The Last-Modified (and possibly ETag) headers must be set up before calling this method. If the return value of this method is anything other than OK, then this value is the one that should be returned from the handler when we have finished. Apache handles the rest for us. For example:

if ((my $result = $r->meets_conditions) != OK) {
    return $result;
}
#else ... go and send the response body ...

If we have a Squid accelerator running, it will often handle the conditionals for us, and we can enjoy its extremely fast responses for such requests by reading the access.log file. Just grep for TCP_IMS_HIT/304. However, there are circumstances under which Squid may not be allowed to use its cache. That is why the origin server (which is the server we are programming) needs to handle conditional GETs as well, even if a Squid accelerator is running.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.