home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Apache The Definitive Guide, 3rd EditionApache: The Definitive GuideSearch this book

10.2. Apache's Logging Facilities

Apache offers a wide range of options for controlling the format of the log files. In line with current thinking, older methods (RefererLog, AgentLog, and CookieLog) have now been replaced by the config_log_module. To illustrate this, we have taken ... /site.authent and copied it to ... /site.logging so that we can play with the logs:

User webuser
Group webgroup
ServerName www.butterthlies.com

IdentityCheck	on
NameVirtualHost 192.168.123.2
<VirtualHost www.butterthlies.com>
LogFormat "customers: host %h, logname %l, user %u, time %t, request %r,
    status %s,bytes %b,"
CookieLog logs/cookies
ServerAdmin sales@butterthlies.com
DocumentRoot /usr/www/APACHE3/site.logging/htdocs/customers
ServerName www.butterthlies.com
ErrorLog /usr/www/APACHE3/site.logging/logs/customers/error_log
TransferLog /usr/www/APACHE3/site.logging/logs/customers/access_log
ScriptAlias /cgi_bin /usr/www/APACHE3/cgi_bin
</VirtualHost>
<VirtualHost sales.butterthlies.com>
LogFormat "sales: agent %{httpd_user_agent}i, cookie: %{http_Cookie}i, 
    referer: %{Referer}o, host %!200h, logname %!200l, user %u, time %t,
    request %r, status %s,bytes %b,"
CookieLog logs/cookies
ServerAdmin sales_mgr@butterthlies.com
DocumentRoot /usr/www/APACHE3/site.logging/htdocs/salesmen
ServerName sales.butterthlies.com
ErrorLog /usr/www/APACHE3/site.logging/logs/salesmen/error_log
TransferLog /usr/www/APACHE3/site.logging/logs/salesmen/access_log
ScriptAlias /cgi_bin /usr/www/APACHE3/cgi_bin
<Directory /usr/www/APACHE3/site.logging/htdocs/salesmen>
AuthType Basic
AuthName darkness
AuthUserFile /usr/www/APACHE3/ok_users/sales
AuthGroupFile /usr/www/APACHE3/ok_users/groups
require valid-user
</Directory>
<Directory /usr/www/APACHE3/cgi_bin>
AuthType Basic
AuthName darkness
AuthUserFile /usr/www/APACHE3/ok_users/sales
AuthGroupFile /usr/www/APACHE3/ok_users/groups
#AuthDBMUserFile /usr/www/APACHE3/ok_dbm/sales
#AuthDBMGroupFile /usr/www/APACHE3/ok_dbm/groups
require valid-user
</Directory>
</VirtualHost>

There are a number of directives.

LogFormat

LogFormat format_string [nickname]
Default: "%h %l %u %t \"%r\" %s %b"
Server config, virtual host

LogFormat sets the information to be included in the log file and the way in which it is written. The default format is the Common Log Format (CLF), which is expected by off-the-shelf log analyzers such as wusage (http://www.boutell.com/) or ANALOG, so if you want to use one of them, leave this directive alone.[35] The CLF format is as follows:

[35]Actually, some log analyzers support some extra information in the log file, but you need to read the analyzer's documentation for details.

host ident authuser date request status bytes
host
Hostname of the client or its IP number.

ident
If IdentityCheck is enabled and the client machine runs identd, the identity information reported by the client. (This can cause performance issues as the server makes identd requests that may or may not be answered.)

authuser
If the request was for a password-protected document, is the user ID.

date
The date and time of the request, in the following format:

[day/month/year:hour:minute:second  tzoffset].
request
Request line from client, in double quotes.

status
Three-digit status code returned to the client.

bytes
The number of bytes returned, excluding headers.

The log format can be customized using a format_string. The commands in it have the format %[condition]key_letter ; the condition need not be present. If it is and the specified condition is not met, the output will be a -. The key_letter s are as follows:

%...a: Remote IP-address 
%...A: Local IP-address 
%...B: Bytes sent, excluding HTTP headers. 
%...b: Bytes sent, excluding HTTP headers. In CLF format i.e. a '-' rather than a 0 
when no bytes are sent. 
%...{Foobar}C: The contents of cookie "Foobar" in the request sent to the server. 
%...D: The time taken to serve the request, in microseconds. 
%...{FOOBAR}e: The contents of the environment variable FOOBAR 
%...f: Filename 
%...h: Remote host 
%...H The request protocol 
%...{Foobar}i: The contents of Foobar: header line(s) in the request sent to the 
server. 
%...l: Remote logname (from identd, if supplied) 
%...m The request method 
%...{Foobar}n: The contents of note "Foobar" from another module. 
%...{Foobar}o: The contents of Foobar: header line(s) in the reply. 
%...p: The canonical Port of the server serving the request 
%...P: The process ID of the child that serviced the request. 
%...q The query string (prepended with a ? if a query string exists, otherwise an 
empty string) %...r: First line of request 
%...s: Status. For requests that got internally redirected, this is the status of the 
*original* request --- 
%...>s for the last. 
%...t: Time, in common log format time format (standard english format) %...
{format}t: The time, in the form given by format, which should be in strftime(3) 
format. (potentially localized) 
%...T: The time taken to serve the request, in seconds. 
%...u: Remote user (from auth; may be bogus if return status (%s) is 401) 
%...U: The URL path requested, not including any query string. 
%...v: The canonical ServerName of the server serving the request. 
%...V: The server name according to the UseCanonicalName setting. 
%...X: Connection status when response is completed. 'X' = connection aborted before 
the response completed. '+' = connection may be kept alive after the response is 
sent. '-' = connection will be closed after the response is sent. (This directive was 
%...c in late versions of Apache 1.3, but this conflicted with the historical ssl %...{var}c syntax.) 

The format string can contain ordinary text of your choice in addition to the % directives.

10.2.1. site.authent — Another Example

site.authent is set up with two virtual hosts, one for customers and one for salespeople, and each has its own logs in ... /logs/customers and ... /logs/salesmen. We can follow that scheme and apply one LogFormat to both, or each can have its own logs with its own LogFormat s inside the <VirtualHost> directives. They can also have common log files, set up by moving ErrorLog and TransferLog outside the <VirtualHost> sections, with different LogFormat s within the sections to distinguish the entries. In this last case, the LogFormat files could look like this:

<VirtualHost www.butterthlies.com>
LogFormat "Customer:..."
...
</VirtualHost>

<VirtualHost sales.butterthlies.com>
LogFormat "Sales:..."
...
</VirtualHost>

Let's experiment with a format for customers, leaving everything else the same:

<VirtualHost www.butterthlies.com>
LogFormat "customers: host %h, logname %l, user %u, time %t, request %r
    status %s, bytes %b,"
...

We have inserted the words host, logname, and so on to make it clear in the file what is doing what. In real life you probably wouldn't want to clutter the file up in this way because you would look at it regularly and remember what was what or, more likely, process the logs with a program that would know the format. Logging on to www.butterthlies.com and going to summer catalog produces this log file:

customers: host 192.168.123.1, logname unknown, user -, time [07/Nov/
    1996:14:28:46 +0000], request GET / HTTP/1.0, status 200,bytes -
customers: host 192.168.123.1, logname unknown, user -, time [07/Nov/
    1996:14:28:49 +0000], request GET /hen.jpg HTTP/1.0, status 200,
    bytes 12291,
customers: host 192.168.123.1, logname unknown, user -, time [07/Nov
    /1996:14:29:04 +0000], request GET /tree.jpg HTTP/1.0, status 200,
    bytes 11532,
customers: host 192.168.123.1, logname unknown, user -, time [07/Nov/
    1996:14:29:19 +0000], request GET /bath.jpg HTTP/1.0, status 200,
    bytes 5880,

This is not too difficult to follow. Notice that while we have logname unknown, the user is -, the usual report for an unknown value. This is because customers do not have to give an ID; the same log for salespeople, who do, would have a value here.

We can improve things by inserting lists of conditions based on the error codes after the % and before the command letter. The error codes are defined in the HTTP 1.0 specification:

200 OK
302 Found
304 Not Modified
400 Bad Request
401 Unauthorized
403 Forbidden
404 Not found
500 Server error
503 Out of resources
501 Not Implemented
502 Bad Gateway

The list from HTTP 1.1 is as follows:

100  Continue
101  Switching Protocols
200  OK
201  Created
202  Accepted
203  Non-Authoritative Information
204  No Content
205  Reset Content 
206  Partial Content
300  Multiple Choices
301  Moved Permanently
302  Moved Temporarily
303  See Other
304  Not Modified
305  Use Proxy
400  Bad Request
401  Unauthorized
402  Payment Required
403  Forbidden
404  Not Found
405  Method Not Allowed
406  Not Acceptable
407  Proxy Authentication Required
408  Request Time-out
409  Conflict
410  Gone
411  Length Required
412  Precondition Failed
413  Request Entity Too Large
414  Request-URI Too Large
415  Unsupported Media Type
500  Internal Server Error
501  Not Implemented
502  Bad Gateway
503  Service Unavailable
504  Gateway Time-out
505  HTTP Version not supported

You can use ! before a code to mean "if not." !200 means "log this if the response was not OK." Let's put this in salesmen:

<VirtualHost sales.butterthlies.com>
LogFormat "sales: host %!200h, logname %!200l, user %u, time %t, request %r,
    status %s,bytes %b,"
...

An attempt to log in as fred with the password don't know produces the following entry:

sales: host 192.168.123.1, logname unknown, user fred, time [19/Aug/
    1996:07:58:04 +0000], request GET HTTP/1.0, status 401, bytes -

However, if it had been the infamous bill with the password theft, we would see:

host -, logname -, user bill, ...

because we asked for host and logname to be logged only if the request was not OK. We can combine more than one condition, so that if we only want to know about security problems on sales, we could log usernames only if they failed to authenticate:

LogFormat "sales: bad user: %400,401,403u"

We can also extract data from the HTTP headers in both directions:

%[condition]{user-agent}i

This prints the user agent (i.e., the software the client is running) if condition is met. The old way of doing this was AgentLog logfile and ReferLog logfile.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.