home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


CONTENTS

Chapter 6. Content Description and Modification

Apache has the ability to tune the information it returns to the abilities of the client — and even to improve the client's efforts. Currently, this affects:

  • The choice of MIME type returned. An image might be the very old-fashioned bitmap, the old-fashioned .gif, the more modern and smaller .jpg, or the extremely up-to-date .png. Once the type is indicated, Apache's reactions can be extended and controlled with a number of directives.

  • The language of the returned file.

  • Updates to the returned file.

  • The spelling of the client's requests.

Apache v2 also offers a new mechanism — Section 6.6, which is described at the end of this chapter.

6.1 MIME Types

MIME stands for Multipurpose Internet Mail Extensions, a standard developed by the Internet Engineering Task Force for email but then repurposed for the Web. Apache uses mod_mime.c, compiled in by default, to determine the type of a file from its extension. MIME types are more sophisticated than file extensions, providing a category (like "text," "image," or "application"), as well as a more specific identifier within that category. In addition to specifying the type of the file, MIME permits the specification of additional information, like the encoding used to represent characters.

The "type" of a file that is sent is indicated by a header near the beginning of the data. For instance:

content-type: text/html

indicates that what follows is to be treated as HTML, though it may also be treated as text. If the type were "image/jpg", the browser would need to use a completely different bit of code to render the data.

This header is inserted automatically by Apache[1] based on the MIME type and is absorbed by the browser so you do not see it if you right-click in a browser window and select "View Source" (MSIE) or similar. Notwithstanding, it is an essential element of a web page.

The list of MIME types that Apache already knows about is distributed in the file ..conf/mime.types or can be found at http://www.isi.edu/in-notes/iana/assignments/media-types/media-types. You can edit it to include extra types, or you can use the directives discussed in this chapter. The default location for the file is .../<site>/conf, but it may be more convenient to keep it elsewhere, in which case you would use the directive TypesConfig.

Changing the encoding of a file with one of these directives does not change the value of the Last-Modified header, so cached copies with the old label may linger after you make such changes. (Servers often send a Last-Modified header containing the date and time the content of was last changed, so that the browser can use cached material at the other end if it is still fresh.) Files can have more than one extension, and their order normally doesn't matter. If the extension .itl maps onto Italian and .html maps onto HTML, then the files text.itl.html and text.html.itl will be treated alike. However, any unrecognized extension, say .xyz, wipes out all extensions to its left. Hence text.itl.xyz.html will be treated as HTML but not as Italian.

TypesConfig  

TypesConfig filename
Default: conf/mime.types
 

The TypesConfig directive sets the location of the MIME types configuration file. filename is relative to the ServerRoot. This file sets the default list of mappings from filename extensions to content types; changing this file is not recommended unless you know what you are doing. Use the AddType directive instead. The file contains lines in the format of the arguments to an AddType command:

MIME-type extension extension ... 

The extensions are lowercased. Blank lines and lines beginning with a hash character (#) are ignored.

AddType  

Syntax: AddType MIME-type extension [extension] ...
Context: Server config, virtual host, directory, .htaccess
Override: FileInfo
Status: Base
Module: mod_mime 
 

The AddType directive maps the given filename extensions onto the specified content type. MIME-type is the MIME type to use for filenames containing extensions. This mapping is added to any already in force, overriding any mappings that already exist for the same extension. This directive can be used to add mappings not listed in the MIME types file (see the TypesConfig directive). For example:

AddType image/gif .gif 

It is recommended that new MIME types be added using the AddType directive rather than changing the TypesConfig file.

Note that, unlike the NCSA httpd, this directive cannot be used to set the type of particular files.

The extension argument is case insensitive and can be specified with or without a leading dot.

DefaultType  

DefaultType
mime-type
Anywhere
 

The server must inform the client of the content type of the document, so in the event of an unknown type, it uses whatever is specified by the DefaultType directive. For example:

DefaultType image/gif

would be appropriate for a directory that contained many GIF images with file-names missing the .gif extension. Note that this is only used for files that would otherwise not have a type.

ForceType  

ForceType media-type
directory, .htaccess 
 

Given a directory full of files of a particular type, ForceType will cause them to be sent as media-type. For instance, you might have a collection of .gif files in the directory .../gifdir, but you have given them the extension .gf2 for reasons of your own. You could include something like this in your Config file:

<Directory <path>/gifdir>
ForceType image/gif
</Directory>

You should be cautious in using this directive, as it may have unexpected results. This directive always overrides any MIME type that the file might usually have because of its extension — so even .html files in this directory, for example, would be served as image/gif.

RemoveType  

RemoveType extension [extension] ...
directory, .htaccess
RemoveType is only available in Apache 1.3.13 and later.
 

The RemoveType directive removes any MIME type associations for files with the given extensions. This allows .htaccess files in subdirectories to undo any associations inherited from parent directories or the server config files. An example of its use is to have the following in /foo/.htaccess:

RemoveType .cgi

This will remove any special handling of .cgi files in the /foo/ directory and any beneath it, causing the files to be treated as the default type.

RemoveType directives are processed after any AddType directives, so it is possible that they may undo the effects of the latter if both occur within the same directory configuration.

The extension argument is case insensitive and can be specified with or without a leading dot.

AddEncoding  

AddEncoding mime-enc extension extension
Anywhere 
 

The AddEncoding directive maps the given filename extensions to the specified encoding type. mime-enc is the MIME encoding to use for documents containing the extension. This mapping is added to any already in force, overriding any mappings that already exist for the same extension. For example:

AddEncoding x-gzip .gz
AddEncoding x-compress .Z 

This will cause filenames containing the .gz extension to be marked as encoded using the x-gzip encoding and filenames containing the .Z extension to be marked as encoded with x-compress.

Older clients expect x-gzip and x-compress; however, the standard dictates that they're equivalent to gzip and compress, respectively. Apache does content-encoding comparisons by ignoring any leading x-. When responding with an encoding, Apache will use whatever form (i.e., x-foo or foo) the client requested. If the client didn't specifically request a particular form, Apache will use the form given by the AddEncoding directive. To make this long story short, you should always use x-gzip and x-compress for these two specific encodings. More recent encodings, such as deflate, should be specified without the x-.

The extension argument is case insensitive and can be specified with or without a leading dot.

RemoveEncoding  

RemoveEncoding extension [extension] ...
directory, .htaccess
RemoveEncoding is only available in Apache 1.3.13 and later.
 

The RemoveEncoding directive removes any encoding associations for files with the given extensions. This allows .htaccess files in subdirectories to undo any associations inherited from parent directories or the server config files. An example of its use might be:

/foo/.htaccess: 
AddEncoding x-gzip .gz
AddType text/plain .asc
<Files *.gz.asc>
    RemoveEncoding .gz
</Files> 

This will cause foo.gz to be marked as being encoded with the gzip method, but foo.gz.asc as an unencoded plain-text file. This might, for example, be a hash of the binary file to prevent illicit alteration.

Note that RemoveEncoding directives are processed after any AddEncoding directives, so it is possible they may undo the effects of the latter if both occur within the same directory configuration.

The extension argument is case insensitive and can be specified with or without a leading dot.

AddDefaultCharset  

AddDefaultCharset On|Off|charset
AddDefaultCharset is only available in Apache 1.3.12 and later.
 

This directive specifies the name of the character set that will be added to any response that does not have any parameter on the content type in the HTTP headers. This will override any character set specified in the body of the document via a META tag. A setting of AddDefaultCharset Off disables this functionality. AddDefaultCharset On enables Apache's internal default charset of iso-8859-1 as required by the directive. You can also specify an alternate charset to be used; e.g. AddDefaultCharset utf-8.

The use of AddDefaultCharset is an important part of the prevention of Cross-Site Scripting (XSS) attacks. For more on XSS, refer to http://www.idefense.com/XSS.html.

AddCharset  

AddCharset charset extension [extension] ...
Server config, virtual host, directory, .htaccess
AddCharset is only available in Apache 1.3.10 and later.
 

The AddCharset directive maps the given filename extensions to the specified content charset. charset is the MIME charset parameter of filenames containing the extension. This mapping is added to any already in force, overriding any mappings that already exist for the same extension. For example:

    AddLanguage ja .ja
    AddCharset EUC-JP .euc
    AddCharset ISO-2022-JP .jis
    AddCharset SHIFT_JIS .sjis

Then the document xxxx.ja.jis will be treated as being a Japanese document whose charset is ISO-2022-JP (as will the document xxxx.jis.ja). The AddCharset directive is useful both to inform the client about the character encoding of the document so that the document can be interpreted and displayed appropriately, and for content negotiation, where the server returns one from several documents based on the client's charset preference.

The extension argument is case insensitive and can be specified with or without a leading dot.

RemoveCharset Directive  


RemoveCharset extension [extension]
directory, .htaccess
RemoveCharset is only available in Apache 2.0.24 and later. 
 

The RemoveCharset directive removes any character-set associations for files with the given extensions. This allows .htaccess files in subdirectories to undo any associations inherited from parent directories or the server config files.

The extension argument is case insensitive and can be specified with or without a leading dot.

The corresponding directives follow:

AddHandler  

AddHandler handler-name extension1 extension2 ...
Server config, virtual host, directory, .htaccess
 

The AddHandler directive wakes up an existing handler and maps the filename(s) extension1, etc., to handler-name. You might specify the following in your Config file:

AddHandler cgi-script cgi bzq

From then on, any file with the extension .cgi or .bzq would be treated as an executable CGI script.

SetHandler  

SetHandler handler-name
directory, .htaccess, location
 

This does the same thing as AddHandler, but applies the transformation specified by handler-name to all files in the <Directory>, <Location>, or <Files> section in which it is placed or in the .htaccess directory. For instance, in Chapter 10, we write:

<Location /status>
<Limit get>
order deny,allow
allow from 192.168.123.1
deny from all
</Limit>
SetHandler server-status
</Location>
RemoveHandler Directive  

RemoveHandler extension [extension] ...
directory, .htaccess
RemoveHandler is only available in Apache 1.3.4 and later. 
 

The RemoveHandler directive removes any handler associations for files with the given extensions. This allows .htaccess files in subdirectories to undo any associations inherited from parent directories or the server config files. An example of its use might be:

/foo/.htaccess: 
    AddHandler server-parsed .html 
/foo/bar/.htaccess: 
    RemoveHandler .html 

This has the effect of returning .html files in the /foo/bar directory to being treated as normal files, rather than as candidates for parsing (see the mod_include module).

The extension argument is case insensitive and can be specified with or without a leading dot.

AcceptFilter  

AcceptFilter on|off
Default: AcceptFilter on
server config
Compatibility: AcceptFilter is available in Apache 1.3.22 and later 
 

figs/unix.gif

AcceptFilter controls a BSD-specific filter optimization. It is compiled in by default — and switched on by default if your system supports it (setsocketopt( ) option SO_ACCEPTFILTER). Currently, only FreeBSD supports this.

figs/unix.gif

See http://httpd.apache.org/docs/misc/perf-bsd44.html for more information.

figs/unix.gif

The compile time flag AP_ACCEPTFILTER_OFF can be used to change the default to off. httpd -V and httpd -L will show compile-time defaults and whether or not SO_ACCEPTFILTER was defined during the compile.

6.2 Content Negotiation

There may be different ways to handle the data that Apache returns, and there are two equivalent ways of implementing this functionality. The multiviews method is simpler (and more limited) than the *.var method, so we shall start with it. The Config file (from ... /site.multiview) looks like this:

User webuser
Group webgroup
ServerName www.butterthlies.com
DocumentRoot /usr/www/APACHE3/site.multiview/htdocs
ScriptAlias /cgi-bin /usr/www/APACHE3/cgi-bin
AddLanguage it .it
AddLanguage en .en
AddLanguage ko .ko
LanguagePriority it en ko

<Directory /usr/www/APACHE3/site.multiview/htdocs>
Options +
MultiViews
</Directory>

For historical reasons, you have to say:

Options +MultiViews

even though you might reasonably think that Options All would cover the case. The general idea is that whenever you want to offer variations of a file (e.g., JPG, GIF, or bitmap for images, or different languages for text), multiviews will handle it. Apache v2 offers a relevant directive.

6.2.1 MultiviewsMatch

MultiviewsMatch permits three different behaviors for mod_negotiation's Multiviews feature.

MultiviewsMatch [NegotiatedOnly] [Handlers] [Filters] [Any]
server config, virtual host, directory, .htaccess
Compatibility: only available in Apache 2.0.26 and later. 

Multiviews allows a request for a file, e.g., index.html, to match any negotiated extensions following the base request, e.g., index.html.en, index.html.fr, or index.html.gz.

The NegotiatedOnly option provides that every extension following the base name must correlate to a recognized mod_mime extension for content negotiation, e.g., Charset, Content-Type, Language, or Encoding. This is the strictest implementation with the fewest unexpected side effects, and it's the default behavior.

To include extensions associated with Handlers and/or Filters, set the MultiviewsMatch directive to either Handlers, Filters, or both option keywords. If all other factors are equal, the smallest file will be served, e.g., in deciding between index.html.cgi of 500 characters and index.html.pl of 1,000 bytes, the .cgi file would win in this example. Users of .asis files might prefer to use the Handler option, if .asis files are associated with the asis-handler.

You may finally allow Any extensions to match, even if mod_mime doesn't recognize the extension. This was the behavior in Apache 1.3 and can cause unpredictable results, such as serving .old or .bak files that the webmaster never expected to be served.

6.2.2 Image Negotiation

Image negotiation is a special corner of general content negotiation because the Web has a variety of image files with different levels of support: for instance, some browsers can cope with PNG files and some can't, and the latter have to be sent the simpler, more old-fashioned, and bulkier GIF files. The client's browser sends a message to the server telling it which image files it accepts:

HTTP_ACCEPT=image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*

Browsers almost always lie about the content types they accept or prefer, so this may not be all that reliable. In theory, however, the server uses this information to guide its search for an appropriate file, and then it returns it. We can demonstrate the effect by editing our ... /htdocs/catalog_summer.html file to remove the .jpg extensions on the image files. The appropriate lines now look like this:

...
<img src="bench" alt="Picture of a Bench">
...
<img src="hen" alt="Picture of a hencoop like a pagoda">
...

When Apache has the Multiviews option turned on and is asked for an image called bench, it looks for the smaller of bench.jpg and bench.gif — assuming the client's browser accepts both — and returns it.

Apache v2 introduces a new directive, which is related to the Filter mechanism (see later in this chapter, Section 6.6).

6.3 Language Negotiation

The same useful functionality also applies to language. To demonstrate this, we need to make up .html scripts in different languages. Well, we won't bother with actual different languages; we'll just edit the scripts to say, for example:

<h1>Italian Version</h1>

and edit the English version so that it includes a new line:

<h1>English Version</h1>

Then we give each file an appropriate extension:

  • index.html.en for English

  • index.html.it for Italian

  • index.html.ko for Korean

Apache recognizes language variants: en-US is seen as a version of general English, en, which seems reasonable. You can also offer documents that serve more than one language. If you had a "franglais" version, you could serve it to both English speakers and Francophones by naming it frangdoc.en.fr. Of course, in real life you would have to go to substantially more trouble, what with translators and special keyboards and all. Also, the Italian version of the index would need to point to Italian versions of the catalogs. But in the fantasy world of Butterthlies, Inc., it's all so simple.

The Italian version of our index would be index.html.it. By default, Apache looks for a file called index.html.<something>. If it has a language extension, like index.html.it, it will find the index file, happily add the language extension, and then serve up what the browser prefers. If, however, you call the index file index.it.html, Apache will still look for, and fail to find, index.html.<something>. If index.html.en is present, that will be served up. If index.en.html is there, then Apache gives up and serves up a list of all the files. The moral is, if you want to deal with index filenames in either order — index.it.html alongside index.html.en — you need the directive:

DirectoryIndex index

to make Apache look for a file called index.<something> rather than the default index.html.<something>.

To give Apache the idea, we need the corresponding lines in the httpd1.conf file:

AddLanguage it .it
AddLanguage en .en
AddLanguage ko .ko

Now our browser behaves in a rather civilized way. If you run ./go 1 on the server, go to the client machine, and go to Edit figs/U2192.gif Preferences figs/U2192.gif Languages (in Netscape 4) or Tools figs/U2192.gif Internet Options figs/U2192.gif Languages (MSIE) or wherever the language settings for your browser are kept, and set Italian to be first, you see the Italian version of the index. If you change to English and reload, you get the English version. It you then go to catalog_summer, you see the pictures even though we didn't strictly specify the filenames. In a small way...magic!

Apache controls language selection if the browser doesn't. If you turn language preference off in your browser, edit the Config file (httpd2.conf ) to insert the line:

LanguagePriority it en ko

stop Apache and restart with ./go 2, the browser will get Italian.

LanguagePriority  

LanguagePriority MIME-lang MIME-lang...
Server config, virtual host, directory, .htaccess
 

The LanguagePriority directive sets the precedence of language variants for the case in which the client does not express a preference when handling a multiviews request. The MIME-lang list is in order of decreasing preference. For example:

LanguagePriority en fr de

For a request for foo.html, where foo.html.fr and foo.html.de both exist but the browser did not express a language preference, foo.html.fr would be returned.

Note that this directive only has an effect if a "best" language cannot be determined by any other means. It will not work if there is a DefaultLanguage defined. Correctly implemented HTTP 1.1 requests will mean that this directive has no effect.

How does this all work? You can look ahead to the environment variables in Chapter 16. Among them were the following:

...
HTTP_ACCEPT=image/gif,image/x-bitmap,image/jpeg,image/pjpeg,*/*
...
HTTP_ACCEPT_LANGUAGE=it
...

Apache uses this information to work out what it can acceptably send back from the choices at its disposal.

AddLanguage  

AddLanguage MIME-lang extension [extension] ...
Server config, virtual host, directory, .htaccess
 

The AddLanguage directive maps the given filename extension to the specified content language. MIME-lang is the MIME language of filenames containing extensions. This mapping is added to any already in force, overriding any mappings that already exist for the same extension. For example:

AddEncoding x-compress .Z
AddLanguage en .en
AddLanguage fr .fr

Then the document xxxx.en.Z will be treated as a compressed English document (as will the document xxxx.Z.en). Although the content language is reported to the client, the browser is unlikely to use this information. The AddLanguage directive is more useful for content negotiation, where the server returns one from several documents based on the client's language preference.

If multiple language assignments are made for the same extension, the last one encountered is the one that is used. That is, for the case of:

AddLanguage en .en
AddLanguage en-uk .en
AddLanguage en-us .en

documents with the extension .en would be treated as being en-us.

The extension argument is case insensitive and can be specified with or without a leading dot.

DefaultLanguage  

DefaultLanguage MIME-lang
Server config, virtual host, directory, .htaccess
DefaultLanguage is only available in Apache 1.3.4 and later. 
 

The DefaultLanguage directive tells Apache that all files in the directive's scope (e.g., all files covered by the current <Directory> container) that don't have an explicit language extension (such as .fr or .de as configured by AddLanguage) should be considered to be in the specified MIME-lang language. This allows entire directories to be marked as containing Dutch content, for instance, without having to rename each file. Note that unlike using extensions to specify languages, DefaultLanguage can only specify a single language.

If no DefaultLanguage directive is in force and a file does not have any language extensions as configured by AddLanguage, then that file will be considered to have no language attribute.

RemoveLanguage  

RemoveLanguage extension [extension] ...
directory, .htaccess
RemoveLanguage is only available in Apache 2.0.24 and later. 
 

The RemoveLanguage directive removes any language associations for files with the given extensions. This allows .htaccess files in subdirectories to undo any associations inherited from parent directories or the server config files.

The extension argument is case insensitive and can be specified with or without a leading dot.

6.4 Type Maps

In the last section, we looked at multiviews as a way of providing language and image negotiation. The other way to achieve the same effects in the current release of Apache, as well as more lavish effects later (probably to negotiate browser plug-ins), is to use type maps, also known as *.var files. Multiviews works by scrambling together a plain vanilla type map; now you have the chance to set it up just as you want it. The Config file in .../site.typemap/conf/httpd1.conf is as follows:

User webuser
Group webgroup
ServerName www.butterthlies.com
DocumentRoot /usr/www/APACHE3/site.typemap/htdocs

AddHandler type-map var
DirectoryIndex index.var

One should write, as seen in this file:

AddHandler type-map var

Having set that, we can sensibly say:

DirectoryIndex index.var

to set up a set of language-specific indexes.

What this means, in plainer English, is that the DirectoryIndex line overrides the default index file index.html. If you also want index.html to be used as an alternative, you would have to specify it — but you probably don't, because you are trying to do something more elaborate here. In this case there are several versions of the index — index.en.html, index.it.html, and index.ko.html — so Apache looks for index.var for an explanation.

Look at ... /site.typemap/htdocs. We want to offer language-specific versions of the index.html file and alternatives to the generalized images bath, hen, tree, and bench, so we create two files, index.var and bench.var (we will only bother with one of the images, since the others are the same).

This is index.var :

# It seems that this URI _must_ be the filename minus the extension...
URI: index; vary="language"
URI: index.en.html
# Seems we _must_ have the Content-type or it doesn't work...
Content-type: text/html
Content-language: en
URI: index.it.html
Content-type: text/html
Content-language: it

This is bench.var :

URI: bench; vary="type"

URI: bench.jpg
Content-type: image/jpeg; qs=0.8 level=3

URI: bench.gif
Content-type: image/gif; qs=0.5 level=1

The first line tells Apache what file is in question, here index.* or bench.* ; vary tells Apache what sort of variation we have. These are the possibilities:

  • type

  • language

  • charset

  • encoding

The name of the corresponding header, as defined in the HTTP specification, is obtained by prefixing these names with Content-. These are the headers:

  • content-type

  • content-language

  • content-charset

  • content-encoding

The qs numbers are quality scores, from 0 to 1. You decide what they are and write them in. The qs values for each type of return are multiplied to give the overall qs for each variant. For instance, if a variant has a qs of .5 for Content-type and a qs of .7 for Content-language, its overall qs is .35. The higher the result, the better. The level values are also numbers, and you decide what they are. In order for Apache to decide rationally which possibility to return, it resolves ties in the following way:

  1. Find the best (highest) qs.

  2. If there's a tie, count the occurrences of "*" in the type and choose the one with the lowest value (i.e., the one with the least wildcarding).

  3. If there's still a tie, choose the type with the highest language priority.

  4. If there's still a tie, choose the type with the highest level number.

  5. If there's still a tie, choose the highest content length.

If you can predict the outcome of all this in your head, you must qualify for some pretty classy award! Following is the full list of possible directives, given in the Apache documentation:

URI: uri [; vary= variations]

URI of the file containing the variant (of the given media type, encoded with the given content encoding). These are interpreted as URLs relative to the map file; they must be on the same server (!), and they must refer to files to which the client would be granted access if the files were requested directly.

Content-type: media_type [; qs= quality [level= level]]

Often referred to as MIME types; typical media types are image/gif, text/plain, or text/html.

Content-language: language

The language of the variant, specified as an ISO 3166 standard language code (e.g., en for English, ko for Korean).

Content-encoding: encoding

If the file is compressed or otherwise encoded, rather than containing the actual raw data, indicates how compression was done. For compressed files (the only case where this generally comes up), content encoding should be x-compress or gzip or deflate, as appropriate.

Content-length: length

The size of the file. The size of the file is used by Apache to decide which file to send; specifying a content length in the map allows the server to compare the length without checking the actual file.

To throw this into action, start Apache with ./go 1, set the language of your browser to Italian (in Netscape, choose Edit figs/U2192.gif Preferences figs/U2192.gif Netscape figs/U2192.gif Languages), and access http://www.butterthlies.com /. You should see the Italian version. MSIE seems to provide less support for some languages, including Italian. You just get the English version. When you look at Catalog-summer.html, you see only the Bench image (and that labeled as "indirect") because we did not create var files for the other images.

6.5 Browsers and HTTP 1.1

Like any other human creation, the Web fills up with rubbish. The webmaster cannot assume that all clients will be using up-to-date browsers — all the old, useless versions are out there waiting to make a mess of your best-laid plans.

In 1996, the weekly Internet magazine devoted to Apache affairs, Apache Week (Issue 25), had this to say about the impact of the then-upcoming HTTP 1.1:

For negotiation to work, browsers must send the correct request information. For human languages, browsers should let the user pick what language or languages they are interested in. Recent beta versions of Netscape let the user select one or more languages (see the Netscape Options, General Preferences, Languages section).

For content-types, the browser should send a list of types it can accept. For example, "text/html, text/plain, image/jpeg, image/gif." Most browsers also add the catch-all type of "*/*" to indicate that they can accept any content type. The server treats this entry with lower priority than a direct match.

Unfortunately, the */* type is sometimes used instead of listing explicitly acceptable types. For example, if the Adobe Acrobat Reader plug-in is installed into Netscape, Netscape should add application/pdf to its acceptable content types. This would let the server transparently send the most appropriate content type (PDF files to suitable browsers, else HTML). Netscape does not send the content types it can accept, instead relying on the */* catch-all. This makes transparent content-negotiation impossible.

Although time has passed, the situation has probably not changed very much. In addition, most browsers do not indicate a preference for particular types. This should be done by adding a preference factor (q) to the content type. For example, a browser that accepts Acrobat files might prefer them to HTML, so it could send an accept-type list that includes:

content-type: text/html: q=0.7, application/pdf: q=0.8 

When the server handles the request, it combines this information with its source quality information (if any) to pick the "best" content type to return.

6.6 Filters

Apache v2 introduced a new mechanism called a "Filter", together with a reworking of Multiviews. The documentation says:

A filter is a process which is applied to data that is sent or received by the server. Data sent by clients to the server is processed by input filters while data sent by the server to the client is processed by output filters. Multiple filters can be applied to the data, and the order of the filters can be explicitly specified.

Filters are used internally by Apache to perform functions such as chunking and byte-range request handling. In addition, modules can provide filters which are selectable using run-time configuration directives. The set of filters which apply to data can be manipulated with the SetInputFilter and SetOutputFilter directives.

The only configurable filter currently included with the Apache distribution is the INCLUDES filter which is provided by mod_include to process output for Server Side Includes. There is also an experimental module called mod_ext_filter which allows for external programs to be defined as filters.

There is a demonstration filter that changes text to uppercase. In .../site.filter/htdocs we have two files, 1.txt and 1.html, which have the same contents:

HULLO WORLD FROM site.filter

The Config file is as follows:

User webuser
Group webgroup

Listen 80
ServerName my586

AddOutputFilter CaseFilter html
DocumentRoot /usr/www/APACHE3/site.filter/htdocs

If we visit the site, we are offered a directory. If we choose 1.txt, we see the contents as shown earlier. If we choose 1.html, we find it has been through the filter and is now all uppercase:

HULLO WORLD FROM SITE.FILTER

The Directives are as follows:

AddInputFilter  

AddInputFilter filter[;filter...] extension [extension ...]
directory, files, location, .htaccess
AddInputFilter is only available in Apache 2.0.26 and later.
 

AddInputFilter maps the filename extensions extension to the filter or filters that will process client requests and POST input when they are received by the server. This is in addition to any filters defined elsewhere, including the SetInputFilter directive. This mapping is merged over any already in force, overriding any mappings that already exist for the same extension.

If more than one filter is specified, they must be separated by semicolons in the order in which they should process the content. Both the filter and extension arguments are case insensitive, and the extension may be specified with or without a leading dot.

AddOutputFilter  

AddOutputFilter filter[;filter...] extension [extension ...]
directory, files, location, .htaccess
AddOutputFilter is only available in Apache 2.0.26 and later.
 

The AddOutputFilter directive maps the filename extensions extension to the filters that will process responses from the server before they are sent to the client. This is in addition to any filters defined elsewhere, including the SetOutputFilter directive. This mapping is merged over any already in force, overriding any mappings that already exist for the same extension. For example, the following configuration will process all .shtml files for server-side includes.

  AddOutputFilter INCLUDES shtml

If more than one filter is specified, they must be separated by semicolons in the order in which they should process the content. Both the filter and extension arguments are case insensitive, and the extension may be specified with or without a leading dot.

SetInputFilter  

SetInputFilter filter[;filter...] 
Server config, virtual host, directory, .htaccess
 

The SetInputFilter directive sets the filter or filters that will process client requests and POST input when they are received by the server. This is in addition to any filters defined elsewhere, including the AddInputFilter directive.

If more than one filter is specified, they must be separated by semicolons in the order in which they should process the content.

SetOutputFilter  

SetOutputFilter filter [filter] ... 
Server config, virtual host, directory, .htaccess
 

The SetOutputFilter directive sets the filters that will process responses from the server before they are sent to the client. This is in addition to any filters defined elsewhere, including the AddOutputFilter directive.

For example, the following configuration will process all files in the /www/data/ directory for server-side includes:

<Directory /www/data/>
SetOutputFilter INCLUDES
</Directory>

If more than one filter is specified, they must be separated by semicolons in the order in which they should process the content.

RemoveInputFilter  

RemoveInputFilter extension [extension] ...
directory, .htaccess
RemoveInputFilter is only available in Apache 2.0.26 and later. 
 

The RemoveInputFilter directive removes any input filter associations for files with the given extensions. This allows .htaccess files in subdirectories to undo any associations inherited from parent directories or the server config files.

The extension argument is case insensitive and can be specified with or without a leading dot.

RemoveOutputFilter  

RemoveOutputFilter extension [extension] ...
directory, .htaccess
RemoveOutputFilter is only available in Apache 2.0.26 and later. 
 

The RemoveOutputFilter directive removes any output filter associations for files with the given extensions. This allows .htaccess files in subdirectories to undo any associations inherited from parent directories or the server config files.

The extension argument is case insensitive and can be specified with or without a leading dot.

[1]  If you are constructing HTML pages on the fly from CGI scripts, you have to insert it explicitly. See Chapter 14 for additional detail.

CONTENTS