Perl Coding Techniques (CGI Programming with Perl)

15.2. Perl Coding Techniques

In this section, we'll discuss programming techniques that will help us develop stable, bug-free applications. These techniques are easy to use, and using them can help you avoid bugs in the first place:

Always use strict.
Check the status of system calls.
Verify that each file open is successful.
Trap die.
Lock files.
Unbuffer the output stream when necessary.
Use binmode when necessary.

Let's review each of these in detail.

15.2.1. Use strict

You should use the strict pragma for any Perl script more than a few lines long, and for all CGI scripts. Simply place the following line at the top of your script:

use strict;

If an import list is not specified, strict generates errors if you use symbolic references, bareword identifiers as subroutines, or use variables that are not localized, fully qualified, or pre-defined using the vars argument.

Here are two snippets of code, one which will compile successfully under strict, and the other which will cause errors:

use strict;

$id = 2000;
$field = \$id;
print $$field;        ## Success, will print 2000

$field = "id";
print $$field;        ## Error!

Symbolic references are names of variables, used to get at the underlying object. In the second snippet above, we are trying to get at the value of $id indirectly. As a result, Perl will generate an error like the following:

Can't use string ("id") as a SCALAR ref while "strict refs" in use ...

Now, let's look at bareword subroutines. Take the following example:

use strict "subs";
greeting;
...
sub greeting
{
    print "Hello Friend!";
}

When Perl looks at the second line, it doesn't know what it is. It could be a string in a void context or it could be a subroutine or function call. When we run this code, Perl will generate the following error:

Bareword "greeting" not allowed while "strict subs" in use at simple line 3.
Execution of simple aborted due to compilation errors.

We can solve this in one of several ways. We can create a prototype, declare greeting as a subroutine with the subs module, use the & prefix, or pass an empty list, like so:

sub greeting;              ## prototype
use subs qw (greeting);    ## use subs

&greeting;                 ## & prefix
greeting(  );                ## null list

This forces us to be clear about the use of subroutines in our applications.

The last restriction that strict imposes on us involves variable declaration. You have probably run across source code where you're not sure if a certain variable is global, or local to a function or subroutine. By using the vars argument with strict, we can eliminate this guessing.

Here's a trivial example:

use strict "vars";
$soda = "Coke";

Since we haven't told Perl what $soda is, it will complain with the following error:

Global symbol "$soda" requires explicit package name at simple line 3.
Execution of simple aborted due to compilation errors.

We can solve this problem by using a fully qualified variable name, declaring the variable using the vars module, or localizing it with my, like so:

$main::soda = "Coke";    ## Fully qualified
use vars qw ($soda);     ## Declare using vars module
my $soda;                ## Localize

As you can see, the strict module imposes a very rigid environment for developing applications. But, that's a very nice and powerful feature, because it helps us track down a variety of bugs. In addition, the module allows for great flexibility as well. For example, if we know that a certain piece of code works fine, but will fail under strict, we can turn certain restrictions off, like so:

## code that passes strict
...
{
    no strict;    ## or no strict "vars";
    
    ## code that will not pass strict
}

All code within the block, delimited by braces, will have no restrictions.

With this type of flexibility and control, there is no reason why you should not be using strict to help you develop cleaner, bug-free applications.

15.2.2. Check Status of System Calls

Before we discuss anything in this section, here's a mantra to code by:

"Always check the return value of all the system commands, including open, eval, and system."

Since web servers are typically configured to run as nobody, or a user with minimal access privileges, we must be very careful when performing any file or system I/O. Take, for example, the following code:

#!/usr/bin/perl -wT

print "Content-type: text/html\n\n";
...
open FILE, "/usr/local/apache/data/recipes.txt";

while (<FILE>) {
    s/^\s*$/<P>/, next if (/^\s*$/);
    s/\n/<BR>/;
    ...
}

close FILE;

If the /usr/local/apache/data directory is not world readable, then the open command will fail, and we will end up with no output. This isn't really desirable, since the user will have no idea what happened.

A solution to this problem is to check the status of open:

...
open FILE, "/usr/local/apache/data/recipes.txt"
    or error ( $q, "Sorry, I can't access the recipe data!" );

print "Content-type: text/html\n\n";
...

If the open fails, we call a custom error function to return a nicely formatted HTML document and exit.

You need to follow the same process when creating or updating files, as well. In order for a CGI application to write to a file, it has to have write permissions on the file, as well as the directories in which the file resides.

Some of the more commonly used system functions include: open, close, flock, eval, and system. You should make it a habit to check the return value of such functions, so you can take preventative action.

15.2.3. Is It Open?

In various examples throughout the book, we've used the open function to create pipes to execute external applications and perform data redirection. Unfortunately, unlike in the previous section, there is no easy way to determine if an application is executed successfully within the pipe.

Here's a simple example that sorts some numerical data.

open FILE, "| /usr/local/gnu/sort"
    or die "Could not create pipe: $!";

print "Content-type: text/plain\n\n";

## fill the @data array with some numerical data
...

print FILE join ("\n", @data);
close FILE;

If we cannot create the pipe, which is almost never the case, we return an error. But, what if the path to the sort command is incorrect? Then, the user will not see any error, nor any reasonable output.

So, how do we determine if the sort command executes successfully? Unfortunately, due to the way the shell operates, the status of the command is available only after the file handle is closed.

Here's an example:

open FILE, "| /usr/local/gnu/sort"
    or die "Could not create pipe: $!";

### code ommitted for brevity
...

close FILE;

my $status = ($? >> 8);

if ( $status ) {
    print "Sorry! I cannot access the data at this time!";
}

Once the file handle is closed, Perl stores the actual return status in the $? variable. We determine the true status (i.e., or 1) by right shifting the actual status by eight bits.

There is also another, albeit less portable and reliable, method to determine the status of the pipe. This involves checking the PID of the child process, spawned by the open function:

#!/usr/bin/perl -wT

use strict;
use CGI;

my $q = new CGI;

my $pid    = open FILE, "| /usr/local/gnu/sort";
my $status = kill 0, $pid;

$status or die "Cannot open pipe to sort: $!";

## We're successful!
print $q->header( "text/plain" );
...

We use the kill function to send a signal of zero to the process created by the pipe. If the process is dead, which means the application within the pipe never got executed, the operating system returns a value of zero. As mentioned before, this technique is not 100% reliable, and will not work on all Unix platforms, but it's something you might want to try.

15.2.4. Trap die

Don't forget about our earlier discussion about die. If your code or a module that you call invokes Perl's die function, it will certainly trigger a 500 Internal Server Error unless you trap it. Use CGI::Carp to trap fatal calls and redirect the messages to the browser. Add this line to the top of your script:

use CGI::Carp qw( fatalsToBrowser );

Refer to Section 5.5, "Handling Errors" for more on CGI::Carp.

15.2.5. File Locking

If you find that you are losing data in your data files, or files are becoming corrupt, then you are probably not locking them. The Web is a multi-user environment and multiple users may access the same document or CGI application at the same time. Let's take a look at an example that doesn't perform any locking:

#!/usr/bin/perl -wT

use CGI;
use CGIBook::Error;

my $cgi      = new CGI;  
my $email    = $cgi->param ("email")    || "Anonymous";
my $comments = $cgi->param ("comments") || "No comments";
...
open FILE, ">>/usr/local/apache/data/guestbook.txt"
    or error( $q, "Cannot add your entry to guestbook!");

print FILE "From $email: $comments\n\n";
close FILE;

print "Location: /generic/thanks.html\n\n";

Now, imagine a scenario where multiple users, say 100, access this application at the exact same time. What happens? A hundred CGI application processes all will try to write to the guestbook.txt file, and more than likely, we'll end up with data loss and corruption.

In order to solve the problem, we need to lock the file. Refer to Section 10.1.1, "Locking" for more details.

15.2.6. Unbuffer Output Stream

Sometimes, you may run into what seems like a very strange error where output doesn't appear in the order in which it is sent to standard output stream. This typically occurs when you call an external application to generate output.

For example, the following example might not work properly on all systems:

#!/usr/bin/perl -wT

print "Content-type: text/plain\n\n";
system "/bin/finger";

In what seems like a very bizarre error, the output from system can actually appear before the content type header. This is the result of buffering the standard output stream.

You can turn buffering off, like so:

$| = 1;

This forces Perl to flush the standard output stream buffers after every write.

15.2.7. Use binmode

On operating systems that distinguish between binary and text files, most notably Windows 95, NT, and the Macintosh, we have to be very careful, especially when returning binary output. For example, the following application creates a simple dynamic image:

#!/usr/bin/perl -wT

use GD;
use strict;

my $image = new GD::Image( 100, 100 );

my $white = $image->colorAllocate( 255, 255, 255 );
my $black = $image->colorAllocate( 0, 0, 0 );
my $red   = $image->colorAllocate( 255, 0, 0 );

$image->arc( 50, 50, 95, 75, 0, 360, $black );
$image->fill( 50, 50, $red );

print "Content-type: image/png\n\n";
print $image->png;

However, the output will result in a broken image if we run the application on a platform mentioned above. The solution is to use the binmode function to treat the resulting output as binary information:

## code omitted for brevity
...
binmode STDOUT;
print $image->png;