HTTP Requests (Practical mod

16.4. HTTP Requests

Section 13.11 of the specification states that the only two cacheable methods are GET and HEAD. Responses to POST requests are not cacheable, as you'll see in a moment.

16.4.1. GET Requests

Most mod_perl programs are written to service GET requests. The server passes the request to the mod_perl code, which composes and sends back the headers and the content body.

But there is a certain situation that needs a workaround to achieve better cacheability. We need to deal with the "?" in the relative path part of the requested URI. Section 13.9 specifies that:

... caches MUST NOT treat responses to such URIs as fresh unless the server provides 
an explicit expiration time.  This specifically means that responses from HTTP/1.0 
servers for such URIs SHOULD NOT be taken from a cache.

Although it is tempting to imagine that if we are using HTTP/1.1 and send an explicit expiration time we are safe, the reality is unfortunately somewhat different. It has been common for quite a long time to misconfigure cache servers so that they treat all GET requests containing a question mark as uncacheable. People even used to mark anything that contained the string "cgi-bin" as uncacheable.

To work around this bug in HEAD requests, we have stopped calling CGI directories cgi-bin and we have written the following handler, which lets us work with CGI-like query strings without rewriting the software (e.g., Apache::Request and CGI.pm) that deals with them:

sub handler {
    my $r = shift;
    my $uri = $r->uri;
    if ( my($u1,$u2) = $uri =~ / ^ ([^?]+?) ; ([^?]*) $ /x ) {
        $r->uri($u1);
        $r->args($u2);
    }
    elsif ( my ($u1,$u2) = $uri =~ m/^(.*?)%3[Bb](.*)$/ ) {
        # protect against old proxies that escape volens nolens
        # (see HTTP standard section 5.1.2)
        $r->uri($u1);
        $u2 =~ s/%3[Bb]/;/g;
        $u2 =~ s/%26/;/g; # &
        $u2 =~ s/%3[Dd]/=/g;
        $r->args($u2);
    }
    DECLINED;
}

This handler must be installed as a PerlPostReadRequestHandler.

The handler takes any request that contains one or more semicolons but no question mark and changes it so that the first semicolon is interpreted as a question mark and everything after that as the query string. So now we can replace the request:

http://example.com/query?BGCOLOR=blue;FGCOLOR=red

with:

http://example.com/query;BGCOLOR=blue;FGCOLOR=red

This allows the coexistence of queries from ordinary forms that are being processed by a browser alongside predefined requests for the same resource. It has one minor bug: Apache doesn't allow percent-escaped slashes in such a query string. So instead of:

http://example.com/query;BGCOLOR=blue;FGCOLOR=red;FONT=%2Ffont%2Fpath

we must use:

http://example.com/query;BGCOLOR=blue;FGCOLOR=red;FONT=/font/path

To unescape the escaped characters, use the following code:

s/%([0-9A-Fa-f]{2})/chr hex $1/ge;

16.4.2. Conditional GET Requests

A rather challenging request that may be received is the conditional GET, which typically means a request with an If-Modified-Since header. The HTTP specification has this to say:

The semantics of the GET method change to a "conditional GET" if the request message 
includes an If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match, or If-
Range header field.  A conditional GET method requests that the entity be transferred 
only under the circumstances described by the conditional header field(s). The 
conditional GET method is intended to reduce unnecessary network usage by allowing
cached entities to be refreshed without requiring multiple requests or transferring 
data already held by the client.

So how can we reduce the unnecessary network usage in such a case? mod_perl makes it easy by providing access to Apache's meets_conditions( ) function (which lives in Apache::File). The Last-Modified (and possibly ETag) headers must be set up before calling this method. If the return value of this method is anything other than OK, then this value is the one that should be returned from the handler when we have finished. Apache handles the rest for us. For example:

if ((my $result = $r->meets_conditions) != OK) {
    return $result;
}
#else ... go and send the response body ...

If we have a Squid accelerator running, it will often handle the conditionals for us, and we can enjoy its extremely fast responses for such requests by reading the access.log file. Just grep for TCP_IMS_HIT/304. However, there are circumstances under which Squid may not be allowed to use its cache. That is why the origin server (which is the server we are programming) needs to handle conditional GETs as well, even if a Squid accelerator is running.

16.4.3. HEAD Requests

Among the headers described thus far, the date-related ones (Date, Last-Modified, and Expires/Cache-Control) are usually easy to produce and thus should be computed for HEAD requests just the same as for GET requests.

The Content-Type and Content-Length headers should be exactly the same as would be supplied to the corresponding GET request. But since it may be expensive to compute them, they can easily be omitted, since there is nothing in the specification that requires them to be sent.

What is important is that the response to a HEAD request must not contain a message-body. The code in a mod_perl handler might look like this:

# compute the headers that are easy to compute
# currently equivalent to $r->method eq "HEAD"
if ( $r->header_only ) { 
    $r->send_http_header;
    return OK;
}

If a Squid accelerator is being used, it will be able to handle the whole HEAD request by itself, but under some circumstances it may not be allowed to do so.

16.4.4. POST Requests

The response to a POST request is not cacheable, due to an underspecification in the HTTP standards. Section 13.4 does not forbid caching of responses to POST requests, but no other part of the HTTP standard explains how the caching of POST requests could be implemented, so we are in a vacuum. No existing caching servers implement the caching of POST requests (although some browsers with more aggressive caching implement their own caching of POST requests). However, this may change if someone does the groundwork of defining the semantics for cache operations on POST requests.

Note that if a Squid accelerator is being used, you should be aware that it accelerates outgoing traffic but does not bundle incoming traffic. Squid is of no benefit at all on POST requests, which could be a problem if the site receives a lot of long POST requests. Using GET instead of POST means that requests can be cached, so the possibility of using GETs should always be considered. However, unlike with POSTs, there are size limits and visibility issues that apply to GETs, so they may not be suitable in every case.