Writing Apache Modules with Perl and C

Writing Apache Modules with Perl and C

By:	Lincoln Stein and Doug MacEachern
Published:	O'Reilly & Associates, Inc. - March 1999

Show Contents Previous Page Next Page

Chapter 9 - Perl API Reference Guide
Special Global Variables, Subroutines, and Literals

In this section...

Special Package Globals

Introduction

Show Contents Go to Top Previous Page Next Page

As you know, Perl has several magic global variables, subroutines, and literals that have the same meaning no matter what package they are called from. A handful of these variables have special meaning when running under mod_perl. Here we will describe these and other global variables maintained by mod_perl. Don't forget that Perl code has a much longer lifetime and lives among many more namespaces in the mod_perl environment than it does in a conventional CGI environment. When modifying a Perl global variable, we recommend that you always localize the variable so modifications do not trip up other Perl code running in the server.

Global Variables

Show Contents Go to Top Previous Page Next Page

We begin with the list of magic global variables that have special significance to mod_perl.

$0

When running under Apache::Registry or Apache::PerlRun, this variable is set to that of the filename field of the request_rec.
When running inside of a <Perl> section, the value of $0 is the path to the configuration file in which the Perl section is located, such as httpd.conf or srm.conf.

$^X

Normally, this variable holds the path to the Perl program that was executed from the shell. Under mod_perl, there is no Perl program, just the Perl library linked with Apache. Thus, this variable is set to that of the Apache binary in which Perl is currently running, such as /usr/local/apache/bin/httpd or C:\Apache\apache.exe.

$|

As the perlvar(1) manpage explains, if this variable is set to nonzero, it forces a flush right away and after every write or print on the currently selected output channel. Under mod_perl, setting $| when the STDOUT filehandle is selected will cause the rflush() method to be invoked after each print(). Because of the overhead associated with rflush(), you should avoid making this a general practice.

$/

The perlvar manpage describes this global variable as the input record separator, newline by default. The same is true under mod_perl; however, mod_perl ensures it is reset back to the newline default after each request.

%@

You are most likely familiar with Perl's $@ variable, which holds the Perl error message or exception value from the last eval() command, if any. There is also an undocumented %@ hash global, which is used internally for certain eval bookkeeping. This variable is put to good use by mod_perl. When an eval() error occurs, the contents of $@ are stored into the %@ hash using the current URI as the key. This allows an ErrorDocument to provide some more clues as to what went wrong.

my $previous_uri = $r->prev->uri;
my $errmsg = $@{$previous_uri};

This looks a bit weird, but it's just a hash key lookup on an array named %@. Mentally substitute %SAVED_ERRORS for %@ and you'll see what's going on here.

%ENV

As with the Perl binary, this global hash contains the current environment. When the Perl interpreter is first created by mod_perl, this hash is emptied, with the exception of those variables passed and set via PerlPassEnv and PerlSetEnv configuration directives.
The usual configuration scoping rules apply. A PerlSetEnv directive located in the main part of the configuration file will influence all Perl handlers, while those located in <Directory>, <Location>, and <Files> sections will only affect handlers in those areas that they apply to.
The Apache SetEnv and PassEnv directives also influence %ENV, but they don't take effect until the fixup phase. If you need to influence %ENV via server configuration for an earlier phase, such as authentication, be sure to use PerlSetEnv and PerlPassEnv instead because these directives take effect as soon as possible.
There are also a number of standard variables that Apache adds to the environment prior to invoking the content handler. These include DOCUMENT_ROOT and SERVER_SOFTWARE. By default, the complete %ENV hash is not set up until the content response phase. Only variables set by PerlPassEnv, PerlSetEnv, and by mod_perl itself will be visible. Should you need the complete set of variables to be available sooner, your handler code can do so with the subprocess_env method.

my $r = shift;
my $env = $r->subprocess_env;
%ENV = %$env;

Unless you plan to spawn subprocesses, however, it will usually be more efficient to access the subprocess variables directly:

my $tmp = $r->subprocess_env->{'TMPDIR'};

If you need to get at the environment variables that are set automatically by Apache before spawning CGI scripts and you want to do this outside of a content handler, remember to call subprocess_env() once in a void context in order to initialize the environment table with the standard CGI and server-side include variables:

$r->subprocess_env;
my $port = $r->subprocess_env('SERVER_SOFTWARE');

There's rarely a legitimate reason to do this, however, because all the information you need can be fetched directly from the request object.
Filling in the %ENV hash before the response phase introduces a little overhead into each mod_perl content handler. If you don't want the %ENV hash to be filled at all by mod_perl, add this to your server configuration file:

PerlSetupEnv Off

Regardless of the setting of PerlSetupEnv, or whether subprocess_env() has been called, mod_perl always adds a few special keys of its own to %ENV.

MOD_PERL

The value of this key will be set to a true value for code to test if it is running in the mod_perl environment or not.

if(exists $ENV{MOD_PERL}) {
. . . do something . . .
}
else {
. . . do something else . . .
}

GATEWAY_INTERFACE

When running under the mod_cgi CGI environment, this value is CGI/1.1. However, when running under the mod_perl CGI environment, GATEWAY_INTERFACE will be set to CGI-Perl/1.1. This can also be used by code to test if it is running under mod_perl; however, testing for the presence of the mod_perl key is faster than using a regular expression or substr to test GATEWAY_INTERFACE.

PERL_SEND_HEADER

If the PerlSendHeader directive is set to On, this environment variable will also be set to On; otherwise, the variable will not exist. This is intended for scripts which do not use the CGI.pm header() method, which always sends proper HTTP headers no matter what the settings.

if($ENV{PERL_SEND_HEADER}) {
   print "Content-type: text/html\n\n";
}
else {
   my $r = Apache->request;
   $r->content_type('text/html');
   $r->send_http_header;
}

%SIG

The Perl %SIG global variable is used to set signal handlers for various signals.
There is always one handler set by mod_perl for catching the PIPE signal. This signal is sent by Apache when a timeout occurs, triggered when the client drops the connection prematurely (e.g., by hitting the stop button). The internal Apache::SIG class catches this signal to ensure the Perl interpreter state is properly reset after a timeout.
The Apache::SIG handler does have one side effect that you might want to take advantage of. If a transaction is aborted prematurely because of a PIPE signal, Apache::SIG will set the environment variable SIGPIPE to the number 1 before it exits. You can pick this variable up with a custom log handler statement and record it if you are interested in compiling statistics on the number of remote users who abort their requests prematurely.
The following is a LogFormat directive that will capture the SIGPIPE environment variable. If the transaction was terminated prematurely, the last field in the log file line will be 1, otherwise -.

LogFormat "%h %l %u %t \"%r\" %s %b %{SIGPIPE}e"

As for all other signals, you should be most careful not to stomp on Apache's own signal handlers, such as that for ALRM. It is best to localize the handler inside of a block so it can be restored as soon as possible:

{
   local $SIG{ARLM} = sub { ... };
   ...
}

At the end of each request, mod_perl will restore the %SIG hash to the same state it was in at server startup time.

@INC

As the perlvar manpage explains, the array @INC contains the list of places to look for Perl scripts to be evaluated by the do EXPR, require, or use constructs.
The same is true under mod_perl. However, two additional paths are automatically added to the end of the array. These are the value of the configured ServerRoot and $ServerRoot/ lib/perl.
At the end of each request, mod_perl will restore the value of @INC to the same value it was during server startup time. This includes any modifications made by code pulled in via PerlRequire and PerlModule. So, be warned: if a script compiled by Apache::Registry contains a use lib or other @INC modification statement, this modification will not "stick." That is, once the script is cached, the modification is undone until the script has changed on disk and is recompiled. If one script relies on another to modify the @INC path, that modification should be moved to a script or module pulled in at server startup time, such as the perl startup script.

%INC

As the perlvar manpage explains, the %INC hash contains entries for each filename that has been included via do or require. The key is the filename you specified, and the value is the location of the file actually found. The require command uses this array to determine whether a given file has already been included.
The same is true in the mod_perl environment. However, this Perl feature may seem like a mod_perl bug at times. One such case is when .pm modules that are modified are not automatically recompiled the way that Apache::Registry script files are. The reason this behavior hasn't been changed is that calling the stat function to test the last modified time for each file in %INC requires considerable overhead and would affect Perl API module performance noticeably. If you need it, the Apache::StatINC module provides the "recompile when modified" functionality, which the authors only recommend using during development. On a production server, it's best to set the PerlFreshRestart directive to On and to restart the server whenever you change a .pm file and want to see the changes take effect immediately.
Another problem area is pulling in library files which do not declare a package namespace. As all Apache::Registry and Apache::PerlRun script files are compiled inside their own unique namespace, pulling in such a file via require causes it to be compiled within this unique namespace. Since the library file will only be pulled in once per request, only the first script to require it will be able to see the subroutines it declares. Other scripts that try to call routines in the library will trigger a server error along these lines:

[Thu Sep 11 11:03:06 1998] Undefined subroutine
&Apache::ROOT::perl::test_2epl::some_function called at
/opt/www/apache/perl/test.pl line 79.

The mod_perl_traps manual page describes this problem in more detail, along with providing solutions.

Show Contents Go to Top Previous Page Next Page