home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Writing Apache Modules with Perl and C
By:   Lincoln Stein and Doug MacEachern
Published:   O'Reilly & Associates, Inc.  - March 1999

Copyright © 1999 by O'Reilly & Associates, Inc.


 


   Show Contents   Previous Page   Next Page

Chapter 5 - Maintaining State / Storing State at the Server Side
Storing State Information in Main Memory

Because Apache server processes are persistent across multiple accesses, you can store small amounts of state information in main memory. When the user first runs your application, it generates a random unique session identifier (session ID) and stores the state information in a data structure, for instance, a hash table keyed by the session ID. The application then sends the session ID back to the user in the form of a cookie, a hidden field, or a component of the URI. When the same user connects again, your application recovers the session ID and retrieves the state information from its data structure.

Sounds simple, but there are some catches. On Win32 systems this scheme works flawlessly because there is only one server process and one single-threaded Perl interpreter. However, on Unix systems there are multiple Apache processes running simultaneously, each with its own memory space. When a user fetches a page, there's no guarantee that he will connect to the same server process as before. What's more, server processes do die from time to time when they reach the limit specified by Apache's MaxRequestsPerChild directive.

If you are using mod_perl on a Unix system, you can work around these problems by using Benjamin Sugars' IPC::Shareable module. It ties Perl data structures (scalars and hashes, but not arrays) to shared memory segments, allowing multiple processes to access the same data structures. The tying process invokes shared memory calls whenever you store data to or fetch values from the tied variable, causing the information to be maintained in a shared memory segment.

As a bonus, the shared data structures persist even when the processes using it go away, so the state information will survive even a complete server shutdown and restart (but not a system reboot). The downside is that working with the shared data structures is not entirely transparent. You have to lock the tied variables prior to updating them and use them in a way that doesn't cause excessive consumption of system resources.

IPC::Shareable is available on CPAN. It requires Raphael Manfredi's Storable module as well.

Here's the idiom for placing a hash in shared memory:

tie %H, 'IPC::Shareable', 'Test', {create => 1, mode => 0666};

The first argument gives the name of the variable to tie, in this case %H. The second is the name of the IPC::Shareable module. The third argument is a "glue" ID that will be used to identify this variable to the processes that will be sharing it. It can be an integer or any string of up to four letters. In the example above we use a glue of Test. The last argument is a hash reference containing the options to pass to IPC::Shareable. There are a variety of options, but the ones you will be using most frequently are create, which if true causes the shared memory segment to spring into existence if it doesn't exist already, and mode, which specifies an octal access mode for the segment. The default mode of 0666 makes the memory segment world-readable and writable. This is useful during debugging so that you can spy on what your module is doing. For production, you will want to make the mode more restrictive, such as 0600 to restrict access to the Apache server only.5

If successful, tie() will tie %H to the shared memory segment and return a reference to the tied object (which you can ignore). Other processes can now attach to this segment by calling tie() with the same glue ID. When one process gets or sets a key in %H, all the other processes see the change. When a process is finished with the tied variable, it should untie() it. Scalar variables can be tied in a similar way.

Shared hashes work a lot like ordinary hashes. You can store scalar variables or complex data structures into its keys. Any of these code fragments is legal:

$H{'fee'}           = 'I smell the blood';
$H{'fie'}           = ['of', 'an', 'englishman'];
$H{'foe'}           = {'and' => 'it', 'makes' => 'me', 'very' => 'hungry'};
$H{'fum'}{'later'}  = 'Do you have any after dinner mints?';

You can also store blessed objects into shared variables but not into filehandles or globs.

It's important to realize what is and what is not tied when you use IPC::Shareable. In the first example we copy a simple scalar value into shared memory space. Any changes that we make to the value, such as a string substitution, are immediately visible to all processes that share the variable.

In the second example, we construct an anonymous array and copy it into the shared variable. Internally IPC::Shareable uses the Storable freeze() function to serialize the structure into a binary representation and then place it in shared memory. As a consequence, changing an individual array element will not propagate correctly to other processes:

$H{'fie'}[2] = 'frenchman';  # this change will NOT propagate

Instead, you must copy the array into ordinary memory, make the changes, and copy it back:

my $temp = $H{'fie'};
$temp->[2] = 'frenchman';
$H{'fie'} = $temp;

For similar reasons we must also use this workaround to change elements in the third example, where the value is an anonymous hash.

Oddly enough, the fourth example behaves differently. In this case, we assign a value to an "automatic" anonymous hash. The hash is automatic because before the assignment, the key fum didn't even exist. After the assignment, not only does fum exist, but it points to an anonymous hash with the single key later. Behind the scenes, IPC::Shareable creates a new tied hash and stores it at $H{'fum'}. We can now read and write to this tied hash directly and the changes will be visible to all processes. The same thing will happen if you first assign an empty hash reference to a key and then start filling in the hash values one by one:

$H{'fum'} = {};
$H{'fum'}{'later'}  = 'Do you have any after dinner mints?';

Although this sounds like a neat feature, it can be a programming trap. Each tied hash that is created by this method occupies its own shared memory segment. If you use this feature too liberally, you'll end up exhausting your system's shared memory segments and subsequent attempts to tie variables will fail.

Another trap involves updating shared variables. Many update operations aren't atomic, even simple ones like $a++. If multiple processes try to update the same shared variable simultaneously, the results can be unpredictable. If you need to perform a nonatomic operation, or if you need a variable to be in a known state across several statements, you should lock before updating it and unlock it when you're through. The shlock() and shunlock() methods allow you to do this. You'll need to call tied() on the variable in order to obtain the underlying tied IPC::Shareable object and then invoke the object's shlock() or shunlock() method:

tied(%H)->shlock;
$H{'englishmen eaten'}++;
tied(%H)->shunlock;

Example 5-5 shows the code for Hangman5. The top of the file now loads the IPC::Shareable module and defines a shared global named %SESSIONS:

use IPC::Shareable ();
use constant SIGNATURE    => 'HANG';
use constant COOKIE_NAME  => 'SessionID5';
use constant MAX_SESSIONS => 100;
use vars qw(%SESSIONS);

%SESSIONS will be tied to shared memory, and it will contain multiple session keys, each one identified by a unique eight-digit numeric session ID. The value of each session will be the familiar $state anonymous hash reference.

# bind session structure to shared memory
bind_sessions() unless defined(%SESSIONS) && tied(%SESSIONS);
# fetch or generate the session id
my $session_id = get_session_id();

The first step in the revised script is to call a new subroutine named bind_sessions() to tie the %SESSIONS global to shared memory. It does this only if %SESSIONS hasn't previously been tied, which will be the case whenever this script is called for the first time in a new child process. After this we call another new subroutine named get_session_id() either to retrieve the old session ID for this user or to generate a new one if this is a new user.

# get rid of old sessions to avoid consuming resources
expire_old_sessions($session_id);

Next comes a call to expire_old_sessions() with the current session ID as the argument. Because we're keeping the session information in a limited resource, we must be careful to remove old sessions when they're no longer in use. We accomplish this by maintaining a rolling list of active sessions. The current session is moved to the top of the list while older sessions drift downward to the bottom. When the list exceeds a preset limit of simultaneous sessions (MAX_SESSIONS => 100 in this example), the oldest session is deleted.

The remainder of the body of the script should look very familiar. It's modified only very slightly from the examples we've seen before:

# retrieve the state
my $state = get_state($session_id) unless param('clear');
# reinitialize if we need to
$state    = initialize($state) if !$state or param('restart');
# process the current guess, if any
my($message, $status) = process_guess(param('guess') || '', $state);
# save the modified state
save_state($state, $session_id);

The get_state() function now takes the session ID as its argument. It retrieves the state from the %SESSIONS variable and copies it into $state, which we process as before. We then write the modified state information back into shared memory by calling save_state() with the state variable and the session ID.

# start the page
print header(-Cookie => => cookie(-name => COOKIE_NAME,
                                 -value => $session_id,
                                 -expires => '+1h'));

The last task is to associate the session ID with the user. We do this by handing the remote browser a cookie containing the ID. Unlike the previous example, this cookie is set to expire after an hour of idle time. We expect the sessions to turn over rapidly, so it doesn't make sense to save the session ID for any longer than that. Although this might seem similar to the previous cookie examples, the big difference is that the cookie doesn't hold any state information itself. It's just a tag for the information stored at the server side.

Let's now turn to the new subroutines:

# Bind the session variables to shared memory using IPC::Shareable
sub bind_sessions {
   die "Couldn't bind shared memory"
      unless tie %SESSIONS, 'IPC::Shareable', SIGNATURE,
                 {create => 1, mode => 0644};
}

The bind_sessions() function calls tie() to bind %SESSIONS to shared memory. The signature is defined in a constant, and we call IPC::Shareable with options that cause the shared memory segment to be created with mode 0644 (world readable) if it doesn't already exist. This will allow you to peak at (but not modify) the variable while the server is running.

The get_session_id() method is responsible for choosing a unique ID for new sessions, or recovering the old ID from ongoing sessions:

sub get_session_id {
   my $id = cookie(COOKIE_NAME);
   return $id if defined($id) and exists $SESSIONS{$id};
   # Otherwise we have to generate an id.
   # Use the random number generator to find an unused key.
   tied(%SESSIONS)->shlock;
   do {
      $id = sprintf("%8d", 1E8*rand());
   } until !exists($SESSIONS{$id});
   # must avoid assigning an empty hash to IPC::Shareable
   $SESSIONS{$id} = {WORD => ''};
   tied(%SESSIONS)->shunlock;
   $id;
}

get_session_id() first attempts to recover a previously assigned session ID from the browser cookie. If the cookie does exist, and the session ID is still valid (it's a valid key for %SESSIONS), we return it. Otherwise we need to generate a new key that is not already in use. To do this we lock %SESSIONS so that it doesn't change underneath us, then enter a small loop that calls the random number generator repeatedly to generate eight-digit session IDs.6 For each ID, we check whether it exists in %SESSIONS and exit the loop when we find one that doesn't. Having found a good ID, we reserve a slot for it by assigning a small anonymous hash to %SESSIONS. Notice that we do not use an empty hash for this purpose, as this would cause IPC::Shareable to create a new unwanted tied variable. We unlock the variable and return the ID.

The expire_old_sessions() subroutine is responsible for garbage-collecting old session information that is no longer in use:

sub expire_old_sessions {
   my $id = shift;
   tied(%SESSIONS)->shlock;
   my @sessions = grep($id ne $_, @{$SESSIONS{'QUEUE'}});
   unshift @sessions, $id;
   if (@sessions > MAX_SESSIONS) {
      my $to_delete = pop @sessions;
      delete $SESSIONS{$to_delete};
   }
   $SESSIONS{'QUEUE'} = \@sessions;
   tied(%SESSIONS)->shunlock;
}

This subroutine works by maintaining a sorted list of sessions in an anonymous array located at the special key $SESSIONS{'QUEUE'}. The subroutine begins by locking %SESSIONS so that it doesn't change during the update process. It recovers the sorted list, removes the current session for the list using the grep() operator, and unshift()s the current session ID to the top of the list. It then looks at the size of the list, and if there are more sessions than allowed by MAX_SESSIONS, it pop()s a session ID from the bottom of the list and deletes that session from the %SESSIONS array. The modified list is copied back into %SESSIONS, which is then unlocked.

sub get_state {
   my $id = shift;
   return undef unless $SESSIONS{$id} and $SESSIONS{$id}{'WORD'};
   $SESSIONS{$id};
}
sub save_state {
   my($state, $id) = @_;
   $SESSIONS{$id} = $state;
}

get_state() and save_state() are trivial in this implementation. get_state() looks up the state information in %SESSIONS using the session ID as its key. save_state() saves the state into %SESSIONS at the indicated ID. Since the assignment is atomic, we don't need to lock the hash for either operation.

Example 5-5. The Hangman Game with Server-Side State in Shared Memory

# file: hangman5.cgi
# hangman game using IPC::Shareable and cookies
use IO::File ();
use CGI qw(:standard);
use CGI::Cookie ();
use IPC::Shareable ();
use strict;
use constant WORDS => '/usr/games/lib/hangman-words';
use constant ICONS => '/icons/hangman';
use constant TRIES => 6;
use constant SIGNATURE    => 'HANG';
use constant COOKIE_NAME  => 'SessionID5';
use constant MAX_SESSIONS => 100;
use vars qw(%SESSIONS);
# bind session structure to shared memory
bind_sessions() unless defined(%SESSIONS) && tied(%SESSIONS);
# fetch or generate the session id
my $session_id = get_session_id();
# get rid of old sessions to avoid consuming resources
expire_old_sessions($session_id);
# retrieve the state
my $state = get_state($session_id) unless param('clear');
# reinitialize if we need to
$state    = initialize($state) if !$state or param('restart');
# process the current guess, if any
my($message, $status) = process_guess(param('guess') || '', $state);
# save the modified state
save_state($state, $session_id);
# start the page
print header(-Cookie    => cookie(-name => COOKIE_NAME,
                                 -value => $session_id,
                                 -expires => '+5d')),
. . . everything in the middle remains the same . . .

# Bind the session variables to shared memory using IPC::Shareable
sub bind_sessions {
   die "Couldn't bind shared memory"
      unless tie %SESSIONS, 'IPC::Shareable', SIGNATURE,
                 {create => 1, mode => 0666};
}
# Fetch or generate the session ID.
# It's simply a key into the %SESSIONS variable
sub get_session_id {
   my $id = cookie(COOKIE_NAME);
   return $id if defined($id) and exists $SESSIONS{$id};
   # Otherwise we have to generate an id.
   # Use the random number generator to find an unused key.
   tied(%SESSIONS)->shlock;
   do {
      $id = sprintf("%8d", 1E8*rand());
   } until !exists($SESSIONS{$id});
   # must avoid assigning an empty hash to IPC::Shareable's tied arrays
   $SESSIONS{$id} = {WORD => ''};
   tied(%SESSIONS)->shunlock;
   $id;
}
# bring the current session to the front and
# get rid of any that haven't been used recently
sub expire_old_sessions {
   my $id = shift;
   tied(%SESSIONS)->shlock;
   my @sessions = grep($id ne $_, @{$SESSIONS{'QUEUE'}});
   unshift @sessions, $id;
   if (@sessions > MAX_SESSIONS) {
      my $to_delete = pop @sessions;
      delete $SESSIONS{$to_delete};
   }
   $SESSIONS{'QUEUE'} = [@sessions];
   tied(%SESSIONS)->shunlock;
}
# Retrieve an existing state
sub get_state {
   my $id = shift;
   my $s = $SESSIONS{$id};
   return undef unless $s and $s->{WORD};
   return $SESSIONS{$id};
}
# Save the current state
sub save_state {
   my($state, $id) = @_;
   $SESSIONS{$id} = $state;
}

The main problem with this technique is that the amount of state information that you can store in shared memory is very limited, making it unsuitable for high-volume or high-reliability applications. A better server-side solution involves using database management systems, which we turn to in the next section.

Footnotes

5 The octal modes used in IPC::Shareable are similar to file modes and have the same effect on other processes' ability to access the data. Do not confuse them with umask, which has no effect on shared memory.

6 Using rand() is not the best way to create unique IDs, because it makes them easy to guess. However, it's simple and fast. The section on DBI databases presents a way to generate hard-to-guess IDs using the MD5 digest function.    Show Contents   Previous Page   Next Page
Copyright © 1999 by O'Reilly & Associates, Inc.