home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Writing Apache Modules with Perl and C
By:   Lincoln Stein and Doug MacEachern
Published:   O'Reilly & Associates, Inc.  - March 1999

Copyright © 1999 by O'Reilly & Associates, Inc.


 


   Show Contents   Previous Page   Next Page

Chapter 7 - Other Request Phases
The Logging Phase

In this section...

Introduction
Sending Email When a Page Is Hit
A DBI Database Logger

Introduction

   Show Contents   Go to Top   Previous Page   Next Page

The very last phase of the transaction before the cleanup at the end is the logging phase. At this point, the request record contains everything there is to know about the transaction, including the content handler's final status code and the number of bytes transferred from the server to the client.

Apache's built-in logging module mod_log_config ordinarily handles this phase by writing a line of summary information to the transfer log. As its name implies, this module is highly configurable. You can give it printf()-like format strings to customize the appearance of the transfer log to your requirements, have it open multiple log files, or even have it pipe the log information to an external process for special processing.

By handling the logging phase yourself, you can perform special processing at the end of each transaction. For example, you can update a database of cumulative hits, bump up a set of hit count files, or notify the owner of a document that his page has been viewed. There are a number of log handlers on CPAN, including Apache::DBILogger, which sends log information to a relational database, and Apache::Traffic, which keeps summaries of bytes transferred on a per-user basis.

Sending Email When a Page Is Hit

   Show Contents   Go to Top   Previous Page   Next Page

The first example of a log handler that we'll show is Apache::LogMail. It sends email to a designated address whenever a particular page is hit and can be used in low-volume applications, such as the vanity home pages of ISP customers. A typical configuration directive would look like this:

<Location /~kryan>
   PerlLogHandler Apache::LogMail
   PerlSetVar     LogMailto  kryan@public.com
   PerlSetVar     LogPattern \.(html|txt)$
</Location>

With this configuration in place, hits on pages in the /~kryan directory will generate email messages. The LogMailto Perl configuration variable specifies kryan@public.com as the lucky recipient of these messages, and LogPattern specifies that only files ending with .html or .txt will generate messages (thus eliminating noise caused by hits on inline images).

Example 7-8 shows the code. After the usual preliminaries, we define the logging phase's handler() routine:

sub handler {
   my $r = shift;
    my $mailto = $r->dir_config('LogMailto');
   return DECLINED unless $mailto;
   my $filepattern = $r->dir_config('LogPattern');
   return DECLINED if $filepattern
      && $r->filename !~ /$filepattern/;

The subroutine begins by fetching the contents of the LogMailto configuration variable. If none is defined, it declines the transaction. Next it fetches the contents of LogPattern. If it finds a pattern, it compares it to the requested document's filename and again declines the transaction if no match is found.

   my $request = $r->the_request;
   my $uri     = $r->uri;
   my $agent   = $r->header_in("User-agent");
   my $bytes   = $r->bytes_sent;
   my $remote  = $r->get_remote_host;
   my $status  = $r->status_line;
   my $date    = localtime;

Now the subroutine gathers up various fields of interest from the request object, including the requested URI, the User-Agent header, the name of the remote host, and the number of bytes sent (method bytes_sent()).

    local $ENV{PATH}; #keep -T happy
   unless (open MAIL, "|/usr/lib/sendmail -oi -t") {
       $r->log_error("Couldn't open mail: $!");
       return DECLINED;
   }

We open a pipe to the sendmail program and use it to send a message to the designated user with the information we've gathered.7 The flags used to open up the sendmail pipe instruct it to take the recipient's address from the header rather than the command line and prevent it from terminating prematurely if it sees a line consisting of a dot.

    print MAIL <<END;
To: $mailto
From: mod_perl httpd <$from>
Subject: Somebody looked at $uri
At $date, a user at $remote looked at
$uri using the $agent browser.
The request was $request,
which resulted returned a code of $status.
$bytes bytes were transferred.
END
   close MAIL;
    return OK;
}

All text that we print to the MAIL pipe is transferred to sendmail's standard input. The only trick here is to start the message with a properly formatted mail header with the To:, From:, and Subject: fields followed by a blank line. When we close the pipe, the mail is bundled up and sent off for delivery.

The final email message will look something like this:

From: Mod Perl <webmaster@public.com>
To: kryan@public.com
Subject: Somebody looked at /~kryan/guestbook.txt
Date: Thu, 27 Aug 1998 08:14:23 -0400
At Thu Aug 27 08:14:23 1998, a user at 192.168.2.1 looked at 
/~kryan/guestbook.txt using the Mozilla/4.04 [en] (X11; I; Linux 2.0.33 i686) browser.
The request was GET /~kryan/guestbook.txt HTTP/1.0,
which resulted returned a code of 200 OK.
462 bytes were transferred. 

Example 7-8. A Logging Module to Notify of Hits via Email

package Apache::LogMail;
# File: Apache/LogMail.pm
use strict;
use Apache::Constants qw(:common);
sub handler {
   my $r = shift;
    my $mailto = $r->dir_config('LogMailto');
   return DECLINED unless $mailto;
    my $filepattern = $r->dir_config('LogPattern');
   return DECLINED if $filepattern
      && $r->filename !~ /$filepattern/;
    my $request = $r->the_request;
   my $uri     = $r->uri;
   my $agent   = $r->header_in("User-agent");
   my $bytes   = $r->bytes_sent;
   my $remote  = $r->get_remote_host;
   my $status  = $r->status_line;
   my $date    = localtime;
    my $from = $r->server->server_admin || "webmaster";
local $ENV{PATH}; #keep -T happy unless (open MAIL, "|/usr/lib/sendmail -oi -t") { $r->log_error("Couldn't open mail: $!"); return DECLINED; }
    print MAIL <<END;
To: $mailto
From: mod_perl httpd <$from>
Subject: Somebody looked at $uri
At $date, a user at $remote looked at
$uri using the $agent browser.
The request was $request,
which resulted returned a code of $status.
$bytes bytes were transferred.
END
    close MAIL;
   return OK;
}
1;
__END__

A DBI Database Logger

   Show Contents   Go to Top   Previous Page   Next Page

The second example of a log phase handler is a DBI database logger. The information from the transaction is sent to a relational database using the DBI interface. The record of each transaction is appended to the end of a relational table, which can be queried and summarized in a myriad of ways using SQL.

This is a skeletal version of the much more complete Apache::DBILog and Apache::DBILogConfig modules, which you should consult before rolling your own.

In preparation to use this module you'll need to set up a database with the appropriate table definition. A suitable MySQL table named access_log is shown here:

+---------+--------------+------+-----+---------------------+-------+
| Field   | Type         | Null | Key | Default             | Extra |
+---------+--------------+------+-----+---------------------+-------+
| when    | datetime     |      |     | 0000-00-00 00:00:00 |       |
| host    | char(255)    |      |     |                     |       |
| method  | char(4)      |      |     |                     |       |
| url     | char(255)    |      |     |                     |       |
| auth    | char(50)     | YES  |     | NULL                |       |
| browser | char(50)     | YES  |     | NULL                |       |
| referer | char(255)    | YES  |     | NULL                |       |
| status  | int(3)       |      |     | 0                   |       |
| bytes   | int(8)       | YES  |     | 0                   |       |
+---------+--------------+------+-----+---------------------+-------+

This table can be created with the following script:

#!/bin/sh
mysql -B test_www <<END
create table access_log (
              when    datetime not null,
              host    varchar(255) not null,
              method  varchar(4)   not null,
              url     varchar(255) not null,
              auth    varchar(50),
              browser varchar(50),
              referer varchar(255),
              status  smallint(3) default 0,
              bytes   int(8)
);
END

The database must be writable by the web server, which should be provided with the appropriate username and password to log in.

The code (Example 7-9) is short and very similar to the previous example, so we won't reproduce it inline.

We begin by bringing in modules that we need, including DBI and the ht_time() function from Apache::Util. Next we declare some constants defining the database, table, and database login information. Since this is just a skeleton of a module, we have hardcoded these values instead of taking them from PerlSetVar configuration directives. You can follow the model of Apache::MimeDBI if you wish to make this module more configurable.

The handler() subroutine recovers the request object and uses it to fetch all the information we're interested in recording, which we store in locals. We also call ht_time() to produce a nicely formatted representation of the request_time() in a format that SQL accepts. We connect to the database and create a statement handle containing a SQL INSERT statement. We invoke the statement handler's execute() statement to write the information into the database, and return with a status code of OK.

The only trick to this handler, which we left out of Apache::LogMail, is the use of the last() to recover the request object. last() returns the final request object in a chain of internal redirects and other subrequests. Usually there are no subrequests, and last() just returns the main (first) request object, in which case the $orig and $r objects in Apache::LogDBI would point to the same request record. In the event that a subrequest did occur, for example, if a request for / was resolved to /index.html, we want to log the request_time, uri, and status from the ultimate request.

Example 7-9. A DBI Database Log Handler

package Apache::LogDBI;
# file: Apache/LogDBI.pm
use Apache::Constants qw(:common);
use strict;
use DBI ();
use Apache::Util qw(ht_time);
use constant DSN       => 'dbi:mysql:test_www';
use constant DB_TABLE  => 'access_log';
use constant DB_AUTH   => ':';
sub handler {
   my $orig = shift;
   my $r = $orig->last;
   my $date    = ht_time($orig->request_time, '%Y-%m-%d %H:%M:%S', 0);
   my $host    = $r->get_remote_host;
   my $method  = $r->method;
   my $url     = $orig->uri;
   my $user    = $r->connection->user;
   my $referer = $r->header_in('Referer');
   my $browser = $r->header_in('User-agent');
   my $status  = $orig->status;
   my $bytes   = $r->bytes_sent;
   my $dbh = DBI->connect(DSN, split ':', DB_AUTH) || die $DBI::errstr;
   my $sth = $dbh->prepare("INSERT INTO ${\DB_TABLE} VALUES(?,?,?,?,?,?,?,?,?)")
                || die $dbh->errstr;
    $sth->execute($date,$host,$method,$url,$user,
                $browser,$referer,$status,$bytes) || die $dbh->errstr;
   $sth->finish;
   return OK;
}
1;
__END__ 

This handler can be installed with the following configuration file directive:

PerlLogHandler Apache::LogDBI

You can place this directive in the main part of the configuration file in order to log all accesses, or place it in a directory section if you're interested in logging a particular section of the site only. An alternative is to install Apache::LogDBI as a cleanup handler, as described in the next section.

Having web transactions logged to a relational database gives you the ability to pose questions of great complexity. Just to give you a taste of what's possible, here are a few useful SQL queries to try:

  • How many hits have I had to date, and how many total bytes transferred?
SELECT count(*),sum(bytes) FROM access_log;
  • How many hits did I have the day before yesterday?
 SELECT count(*) FROM access_log
   WHERE to_days(when)=to_days(now())-2;
  • How many hits have I had, grouped by hour of access?
 SELECT date_format(when,'H') as hour,count(*) FROM access_log
   GROUP BY hour;
  • What URLs may be broken, and who is pointing at them?
SELECT url,referer,count(url) FROM access_log
   WHERE status=404
   GROUP BY url;
  • What are the top 10 most popular URLs on my site?
SELECT url,count(*) as count FROM access_log
   GROUP BY url
   ORDER BY count desc
   LIMIT 10;
  • What is my site's transfer rate, sorted by the hour of day?
SELECT date_format(when,'H') as hour,
      sum(bytes)/(60*60) as bytes_per_min
   FROM access_log
   GROUP BY hour;

Footnotes

7 sendmail is only available on Unix systems. If you are using Windows or Windows NT, you would be best served by replacing the piped open with the appropriate calls to the Perl Net::SMTP module. You can find this module on CPAN.

   Show Contents   Go to Top   Previous Page   Next Page
Copyright © 1999 by O'Reilly & Associates, Inc.