home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Writing Apache Modules with Perl and C
By:   Lincoln Stein and Doug MacEachern
Published:   O'Reilly & Associates, Inc.  - March 1999

Copyright © 1999 by O'Reilly & Associates, Inc.


 


   Show Contents   Previous Page   Next Page

Chapter 5 - Maintaining State / Storing State Information in SQL Databases
A DBI Backend for Hangman

Like the persistent memory version of the hangman game, the DBI implementation has to have code to open the database, to set and fetch session records from the database, to generate unique session IDs for each incoming connection, and to expire old sessions that we're no longer interested in. Example 5-6 shows what's new and different on the server side. There are no visible changes in the user interface.

This script assumes a database has already been set up that contains a table named hangman with this structure:8

CREATE TABLE hangman (
     session_id      char(8) primary key,
     WORD            char(30),
     GUESSED         char(26),
     GAMENO          int,
     WON             int,
     GUESSES_LEFT    int,
     TOTAL           int,
     modified        timestamp,
     KEY(modified)
)

Before stepping through the script, let's first look at get_state() and save_state():

sub get_state {
   my $id = shift;
   my $sth = $DBH->prepare(<<END) || die "Prepare: ", $DBH->errstr;
SELECT * FROM $DB_TABLE WHERE session_id='$id'
END
   $sth->execute || die "Execute: ", $sth->errstr;
   my $state = $sth->fetchrow_hashref;
   $sth->finish;
   return $state;
}

get_state() is responsible for recovering the state information as a hash reference, given the ID of an existing session. At its core is this SQL statement:

SELECT * FROM hangman WHERE session_id='$id'

This selects all columns from the record named by the session ID. We then call DBI's fetchrow_hashref() to retrieve the record in the form as a hash reference in which the keys (WORD, GUESSED, GAMENO, and so on) correspond to the columns of the selected record. As it happens, this hashref is identical to the state variable that the higher levels of the script operate on, so all we have to do is to return it.

The save_state() subroutine is almost as simple:

sub save_state {
   my($state, $id) = @_;
   my $sth = $DBH->prepare(<<END) || die "prepare: ", $DBH->errstr;
UPDATE $DB_TABLE
  SET WORD=?,GUESSED=?,GAMENO=?,WON=?,TOTAL=?,GUESSES_LEFT=?
  WHERE session_id='$id'
END
   $sth->execute(@{$state}{qw(WORD GUESSED GAMENO WON TOTAL GUESSES_LEFT)})
      || die "execute: ", $DBH->errstr;
   $sth->finish;
}

This subroutine constructs a DBI statement handler containing placeholders for the six keys in $state. It then calls the statement handler's execute() statement to write the values from $state into the database.

The remainder of the code is concerned with the generation and maintenance of session IDs. Although most of the state information is stored on the server's side of the connection, there's more to the story. There will always have to be some information stored by the client because otherwise, there would be no way for the server to distinguish one client from another and, hence, no way to retrieve the correct session record. Some of the obvious ways of distinguishing one client from another, such as recording their IP addresses, do not work well in practice (a dial-in user may have several IP addresses, and conversely, all America Online users share the IP address of a few large proxy servers). The general technique for identifying clients is to generate a session ID for them when they first connect to your application and then arrange for them to return the session ID to you on subsequent requests. A session ID can be anything you like. In the hangman game we use an eight-digit hexadecimal number, which is sufficient for about four billion active sessions.

We've already seen two techniques that can be adapted to this purpose: HTTP cookies and fill-out forms. Because the session ID is a relatively small amount of information, there's also a third option available to us. We can store the session ID in the URI itself as additional path information. When a connection comes in from a new client we assign it a randomly generated ID, append it to our URI as additional path information, and send the client an HTTP redirect() directive to make it fetch this new URI. On subsequent requests, we recover the session ID from the additional path information. This technique has an advantage over cookies in that it is compatible with all browsers, including those for which the user has disabled cookies. It has the disadvantage that the session ID is visible to the user. The URI displayed by the browser will look something like this:

http://www.modperl.com/perl/hangman5.cgi/fd2c95dd

A side benefit of this technique is that the user can bookmark this URI, session ID and all, and come back to a game later.

Beginning our walkthrough of the script, we bring in the DBI library and define a few new constants:

use DBI ();
use strict;
use vars qw($DBH $DB_TABLE $ID_LENGTH);
use constant EXPIRE => 60*60*24*30;  # allow 30 days before expiration
use constant DB     => 'dbi:mysql:www';
use constant DBAUTH => 'nobody:';
use constant SECRET => 'modperl reigns';
use constant MAX_TRIES => 10;
$DB_TABLE           = "hangman6";
$ID_LENGTH          = 8;  # length of the session ID

EXPIRE is the length of time to keep sessions around before expiring them from the database. Unlike the shared-memory version of the script, the session data is stored on disk. This means that we can be less draconian in our expiration policy. An unused session is allowed 30 days before being recycled. DB is the DBI data source name for the database, and DBAUTH is the database authentication information, in the format :. SECRET and MAX_TRIES are used in the generation of new session keys. $DB_TABLE is the database table name to use and $ID_LENGTH is the length of the session key in characters.

$DBH = DBI->connect(DB, split(':', DBAUTH, 2), {PrintError => 0})
    || die "Couldn't open database: ", $DBI::errstr;
my($session_id, $note) = get_session_id();

The script begins by opening the database and saving its database handle in a global named $DBH. Next, we retrieve the session ID (or generate a new one) by calling a subroutine named get_session_id(). get_session_id() returns a two-element list: the session ID and a note that can be used to alert the user to exceptional conditions. In this script, the only exceptional condition that occurs is when the user tries to use a session ID that has expired.

my $state = get_state($session_id) unless param('clear');
$state    = initialize($state) if !$state or param('restart');
my($message, $status) = process_guess(param('guess') || '', $state);
save_state($state, $session_id);

With the session ID in hand, we retrieve the state by calling the get_state() subroutine that we looked at earlier. We then (re)initialize the state variable as before if need be, process the user's guess if any, and call save_state() to write the modified session back to the database. The remainder of the script is unchanged from previous versions, except that we display the note returned by get_session_id() at the top of the page if it's nonempty.

We'll look at the get_session_id() subroutine now, which is responsible for retrieving an existing session ID or generating a new one:

sub get_session_id {
   my(@result);
   expire_old_sessions();
   my($id) = path_info() =~ m:^/([a-h0-9]{$ID_LENGTH}):o;
   return @result if $id and @result = check_id($id);
    # If we get here, there's not already an ID in the path info.
   my $session_id = generate_id();
   die "Couldn't make a new session id" unless $session_id;
   print redirect(script_name() . "/$session_id");
   exit 0;
}

This subroutine first expires all out-of-date sessions by calling expire_old_sessions().9 Next, it calls CGI.pm's path_info() function to return the additional path information and attempt to match it against the expected session ID pattern. If a likely looking session ID is found, we call check_id() to ensure that the session ID actually corresponds to a database record. Otherwise, we call generate_id() to create a new session ID. We append the ID to our URI (using CGI.pm's script_name() function), incorporate it into a call to redirect(), and exit. In this case the subroutine never returns to the caller, but the redirected browser immediately generates a second call to the script, this time with the session ID appended to the URI.

The expire_old_sessions() subroutine is simple:

sub expire_old_sessions {
   $DBH->do(<<END);
DELETE FROM $DB_TABLE
   WHERE (unix_timestamp()-unix_timestamp(modified))>${\EXPIRE}
END
}

The subroutine consists of a single DBI call that sends a SQL DELETE statement to the database. The effect of the call is to delete all session records that are older than the time limit set by the EXPIRE constant.

generate_id(), which chooses new session IDs, is slightly more complex:

sub generate_id {
   # Create a new session id
   my $tries = 0;
   my $id = hash(SECRET . rand());
   while ($tries++ < MAX_TRIES) {
      last if $DBH->do("INSERT INTO $DB_TABLE (session_id) VALUES ('$id')");
      $id = hash(SECRET . $id);
   }
   return undef if $tries >= MAX_TRIES;  # we failed
   return $id;
}

The reason for this complexity is that it is important to generate a unique session ID in such a way that valid session IDs cannot be trivially guessed. Otherwise it would be possible for a malicious person to hijack another user's session by misappropriating that user's session ID. This is not important in the hangman game, but becomes an issue in transactions in which things of value (money, merchandise, confidential information) are changing hands. A simple sequence of session IDs, such as choosing one higher than the previous highest, is too obvious. IDs generated from the rand() call are unreliable as well because once you know where you are in the series, you can generate all the subsequent values.

Instead, we use a combination of rand() and the MD5 message digest algorithm. We begin by computing the MD5 hash of the value of rand() concatenated with a secret phrase. This extra concatenation step makes it impossible to derive the value of the next session ID from the previous one. Instead of calling MD5 directly, we call a small internal subroutine, hash(), to compute the MD5 hash and then truncate it to eight characters. This reduces the size of the session ID at the cost of making the ID somewhat easier to guess.10 We then enter a loop in which we repeatedly attempt to insert the current session ID into the database. If a record with that session ID does not already exist in the database, the insertion statement returns a true result code and we immediately return the ID. Otherwise we generate a new trial ID by hashing the current ID concatenated with the secret, and try again. We do this up to MAX_TRIES times, at which point we give up. This allows us to fill up the space of possible session IDs to approximately 90 percent, or around 3 billion.

The check_id() subroutine is called by get_session_id() when the browser provides a previous session ID. Its job is to check that the session ID still corresponds to a database record. If not, it attempts to insert a record with that session ID into the database and delivers a warning to the user that his game session may have expired.

sub check_id {
   my $id = shift;
   return ($id, '')
      if $DBH->do("SELECT 1 FROM $DB_TABLE WHERE session_id='$id'") > 0;
   return ($id, 'The record of your game may have expired.  Restarting.')
      if $DBH->do("INSERT INTO $DB_TABLE (session_id) VALUES ('$id')");
   return ();
}

The reason we try to reuse old session IDs is that the user may have bookmarked the URI of the game, session ID and all. We honor the bookmark so that the user doesn't have to discard it and enter a new one after his session has expired. check_id() consists of two DBI calls. In the first, it makes a SQL SELECT query looking for a record matching the provided session ID. Since we're only interested in whether the query succeeds or fails, we select a constant 1 instead of a named set of columns. If the query fails, then the database does not already contain the session ID. We call DBI again to insert the session ID into the database. If this fails (which it might in the unusual case of another instance of this script picking the same session ID from within generate_id()) we return an empty list. Otherwise we return the ID and the warning message. Although the user has lost the record of his old set of games, his bookmarked URI will still be valid and can now be used to return to the new set.

The last new routine defined in this version of the game is hash(), which simply computes the MD5 digest of the value passed to it, then truncates it to $ID_LENGTH characters:

sub hash {
   my $value = shift;
   return substr(MD5->hexhash($value), 0, $ID_LENGTH);
}

Example 5-6. The Hangman Game with a DBI Backend

#!/usr/local/bin/perl
# file: hangman6.pl
# hangman game using DBI
use IO::File ();
use CGI qw(:standard);
use DBI ();
use MD5 ();
use strict;
use vars qw($DBH $ID_LENGTH);
use constant WORDS => '/usr/games/lib/hangman-words';
use constant ICONS => '/icons/hangman';
use constant TRIES => 6;
# session settings
use constant EXPIRE => 60*60*24*30;  # allow 30 days before expiration
use constant DB     => 'dbi:mysql:www';
use constant DBAUTH => 'nobody:';
use constant SECRET => 'modperl reigns';
use constant MAX_TRIES => 10;
$ID_LENGTH          = 8;  # length of the session ID
# Open the database
$DBH = DBI->connect(DB,split(':',DBAUTH,2),{PrintError=>0})
    || die "Couldn't open database: ",$DBI::errstr;
# get the current session ID, or make one
my ($session_id,$note) = get_session_id();
# retrieve the state
my $state      = get_state($session_id) unless param('clear');
# reinitialize if we need to
$state    = initialize($state) if !$state or param('restart');
# process the current guess, if any
my ($message,$status) = process_guess(param('guess') || '',$state);
# save the modified state
save_state($state,$session_id);
# start the page
print header(),
   start_html(-Title   => 'Hangman 5',
             -bgcolor => 'white',
             -onLoad  => 'if (document.gf) document.gf.guess.focus()'),
   h1('Hangman 5: Database Sessions with URL rewriting');
print p(font({-color=>'red'},$note)) if $note;
 . . . everything in the middle is the same . . .

# Retrieve the session ID from the path info.  If it's not
# already there, add it to the path info with a redirect.
sub get_session_id {
   my(@result);
   expire_old_sessions();
   my($id) = path_info() =~ m:^/([a-h0-9]{$ID_LENGTH}):o;
   return @result if $id and @result = check_id($id);
    # If we get here, there's not already an ID in the path info.
   my $session_id = generate_id();
   die "Couldn't make a new session id" unless $session_id;
   print redirect(script_name() . "/$session_id");
   exit 0;
}
# Find a new unique ID and insert it into the database
sub generate_id {
   # Create a new session id
   my $tries = 0;
   my $id = hash(SECRET . rand());
   while ($tries++ < MAX_TRIES) {
      last if $DBH->do("INSERT INTO $DB_TABLE (session_id) VALUES ('$id')");
      $id = hash(SECRET . $id);
   }
   return undef if $tries >= MAX_TRIES;  # we failed
   return $id;
}
# check to see that an old ID is valid
sub check_id {
   my $id = shift;
   return ($id, '')
      if $DBH->do("SELECT 1 FROM $DB_TABLE WHERE session_id='$id'") > 0;
   return ($id, 'The record of your game may have expired.  Restarting.')
      if $DBH->do("INSERT INTO $DB_TABLE (session_id) VALUES ('$id')");
   return ();
}
# generate a hash value
sub hash {
   my $value = shift;
   return substr(MD5->hexhash($value), 0, $ID_LENGTH);
}
sub expire_old_sessions {
   $DBH->do(<<END);
DELETE FROM $DB_TABLE
   WHERE (unix_timestamp()-unix_timestamp(modified))>${\EXPIRE}
END
}
# get the state from the database
sub get_state {
   my $id = shift;
   my $sth = $DBH->prepare("SELECT * FROM $DB_TABLE WHERE session_id='$id'
   AND WORD<>NULL")
      || die "Prepare: ", $DBH->errstr;
   $sth->execute || die "Execute: ", $sth->errstr;
   my $state = $sth->fetchrow_hashref;
   $sth->finish;
   return $state;
}
# save the state in the database
sub save_state {
   my($state, $id) = @_;
   my $sth = $DBH->prepare(<<END) || die "prepare: ", $DBH->errstr;
UPDATE $DB_TABLE
  SET WORD=?,GUESSED=?,GAMENO=?,WON=?,TOTAL=?,GUESSES_LEFT=?
  WHERE session_id='$id'
END
   $sth->execute(@{$state}{qw(WORD GUESSED GAMENO WON TOTAL GUESSES_LEFT)}) 
|| die "execute: ", $DBH->errstr; $sth->finish; }
   Show Contents   Previous Page   Next Page
Copyright © 1999 by O'Reilly & Associates, Inc.