home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Book HomeApache: The Definitive GuideSearch this book

8.8. Rewrite

The preceding section described the alias module and its allies. Everything these directives can do, and more, can be done instead by mod_rewrite.c, an extremely compendious module that is almost a complete software product in its own right.[53] The documentation is thorough, and the reader is referred to http://www. engelschall.com/pw/apache/rewriteguide/ for any serious work. This section is intended for orientation only.

[53]But for simple tasks Alias and friends are much easier to use.

Rewrite takes a rewriting pattern and applies it to the URL. If it matches, a rewriting substitution is applied to the URL. The patterns are regular expressions familiar to us all in their simplest form; for example, mod.*\.c, which matches any module filename. The complete science of regular expressions is somewhat extensive, and the reader is referred to ... /src/regex/regex.7, a manpage that can be read with nroff -man regex.7 (on FreeBSD, at least). Regular expressions are also described in the POSIX specification and in Jeffrey Friedl's Mastering Regular Expressions (O'Reilly & Associates). The essence of regular expressions is that a number of special characters can be used to match parts of incoming URLs.

The substitutions can include mapping functions that take bits of the incoming URL and look them up in databases or even apply programs to them. The rules can be applied repetitively and recursively to the evolving URL. It is possible (as the documentation says) to create "rewriting loops, rewriting breaks, chained rules, pseudo if-then-else constructs, forced redirects, forced MIME-types, forced proxy module throughout." The functionality is so extensive that it is probably impossible to master it in the abstract. When and if you have a problem of this sort, it looks as if mod_rewrite can solve it, given enough intellectual horsepower on your part!

The module can be used in four situations:

  • By the administrator inside the server Config file to apply in all contexts. The rules are applied to all URLs of the main server and all URLs of the virtual servers.

  • By the administrator inside <VirtualHost> blocks. The rules are applied only to the URLs of the virtual server.

  • By the administrator inside <Directory> blocks. The rules are applied only to the specified directory.

  • By users in their .htaccess files. The rules are applied only to the specified directory.

The directives look simple enough.

8.8.4. RewriteMap

RewriteMap mapname {txt,dbm,prg,rnd,int}: filename
Server config, virtual host

Defines an external mapname file that inserts substitution strings through key lookup. The module passes mapname a query in the form:

$(mapname : Lookupkey | DefaultValue)

If the Lookupkey value is not found, DefaultValue is returned.

The type of mapname must be specified by the next argument:

txt

Indicates plain-text format, that is, an ASCII file with blank lines, comments that begin with "#", or useful lines, in the format:

MatchingKey SubstituteValue
dbm

Indicates DBM hashfile format, that is, a binary NDBM (the "new" dbm interface, now about 15 years old, also used for dbm auth) file containing the same material as the plain-text format file. You create it with any ndbm tool or by using the Perl script dbmmanage from the support directory of the Apache distribution.

prg

Indicates program format, that is, an executable (a compiled program or a CGI script) that is started by Apache. At each lookup, it is passed the key as a string terminated by newline on stdin and returns the substitution value, or the word NULL if lookup fails, in the same way on stdout. The manual gives two warnings:

  • Keep the program or script simple because if it hangs, it hangs the Apache server.

  • Don't use buffered I/O on stdout because it causes a deadlock. In C, use:

    setbuf(stdout,NULL)

    In Perl, use:

    select(STDOUT); $|=1;]
rnd

Indicates randomized plain text, which is similar to the standard plain-text variant but has a special postprocessing feature: after looking up a value, it is parsed according to contained "|" characters that have the meaning of "or". In other words, they indicate a set of alternatives from which the actual returned value is chosen randomly. Although this sounds crazy and useless, it was actually designed for load balancing in a reverse proxy situation, in which the looked-up values are server names -- each request to a reverse proxy is routed to a randomly selected server behind it.

int

Indicates an internal Apache function. Two functions exist: toupper() and tolower(), which convert the looked-up key to all uppercase or all lowercase.

8.8.8. A Rewrite Example

The Butterthlies salespeople seem to be taking their jobs more seriously. Our range has increased so much that the old catalog based around a single HTML script is no longer workable because there are too many cards. We have built a database of cards and a utility called cardinfo that accesses it using the arguments:

cardinfo cardid query

where cardid is the number of the card, and query is one of the following words: "price," "artist," or "size." The problem is that the salespeople are too busy to remember the syntax, so we want to let them log onto the card database as if it were a web site. For instance, going to http://sales.butterthlies.com/info/2949/price would return the price of card number 2949. The Config file is in ... /site.rewrite :

User webuser
Group webgroup
# Apache requires this server name, although in this case it will 
# never be used.
# This is used as the default for any server that does not match a
# VirtualHost section.
ServerName www.butterthlies.com

NameVirtualHost 192.168.123.2

<VirtualHost www.butterthlies.com>
ServerAdmin sales@butterthlies.com
DocumentRoot /usr/www/site.rewrite/htdocs/customers
ServerName www.butterthlies.com
ErrorLog /usr/www/site.rewrite/logs/customers/error_log
TransferLog /usr/www/site.rewrite/logs/customers/access_log
</VirtualHost>

<VirtualHost sales.butterthlies.com>
ServerAdmin sales_mgr@butterthlies.com
DocumentRoot /usr/www/site.rewrite/htdocs/salesmen
Options ExecCGI indexes
ServerName sales.butterthlies.com
ErrorLog /usr/www/site.rewrite/logs/salesmen/error_log
TransferLog /usr/www/site.rewrite/logs/salesmen/access_log
RewriteEngine on
RewriteLog logs/rewrite
RewriteLogLevel 9
RewriteRule ^/info/([^/]+)/([^/]+)$   /cgi-bin/cardinfo?$2+$1 [PT]
ScriptAlias /cgi-bin /usr/www/cgi-bin
</VirtualHost>

In real life cardinfo would be an elaborate program. However, here we just have to show that it could work, so it is extremely simple:

#!/bin/sh
#
echo "content-type: text/html"
echo sales.butterthlies.com
echo "You made the query $1 on the card $2"

To make sure everything is in order before we do it for real, we turn RewriteEngine off and access http://sales.butterthlies.com/cgi-bin/cardinfo. We get back the following message:

The requested URL /info/2949/price was not found on this server.

This is not surprising. We now turn RewriteEngine on and look at the crucial line in the Config file, which is:

RewriteRule ^/info/([^/]+)/([^/]+)$ /cgi-bin/cardinfo?$2+$1 [PT]

Translated into English this means the following: at the start of the string, match /info/, followed by one or more characters that aren't "/", and put those characters into the variable $1 (the parentheses do this; $1 because they are the first set). Then match a "/", then one or more characters aren't "/", and put those characters into $2. Then match the end of the string and pass the result through [PT] to the next rule, which is ScriptAlias. We end up as if we had accessed http://sales.butterthlies.com/cgi-bin/cardinfo?<card ID>+<query>.

If the CGI script is on a different web server for some reason, we could write:

RewriteRule ^/info/([^/]+)/([^/]+)$ http://somewhere.else.com/cgi-bin/
    cardinfo/$2+$1[PT]

Note that this pattern won't match /info/123/price/fred, because it has too many slashes in it.

If we run all this with ./go, and access http://sales.butterthlies.com/info/2949/price from the client, we see the following message:

You made the query price on card 2949







Library Navigation Links

Copyright © 2001 O'Reilly & Associates. All rights reserved.