Common Mistakes in Sending Email (Perl for System Administration)

8.2. Common Mistakes in Sending Email

Now we can begin using email as a notification method. However, when we start to write code that performs this function, we quickly find that the how to send mail is not nearly as interesting as the when and what to send.

This section explores those questions by taking a contrary approach. If we look at what and how not to send mail we'll get a deeper insight into these issues. Let's talk about some of the most common mistakes made when writing system administration programs that send mail.

8.2.1. Overzealous Message Sending

By far, the most common mistake is sending too much mail. It is a great idea to have scripts send mail. If there's a service disruption, normal email or email sent to a pager are good ways to bring this problem to the attention of a human. But under most circumstances it is a very bad idea to have your program send mail about the problem every five minutes or so. Overzealous mail generators are quickly added to the mail filters of the very humans who should be reading the mail. The end result is that important mail is routinely ignored.

8.2.1.1. Controlling the frequency of mail

The easiest way to avoid what I call "mail beaconing" is to build safeguards into the programs to gate the delay between messages. If your script runs constantly, it is easy to stash the time of the last mail message sent in a variable like this:

$last_sent = time;

If your program is started up every N minutes or hours via Unix's cron or NT scheduler service mechanisms, this information can be written to a one-line file and read again the next time the program is run. Be sure in this case to pay attention to some of the security precautions listed in Chapter 1, "Introduction".

Depending on the situation, you can get fancy about your delay times. This code shows an exponential backoff:

$max  = 24*60*60; # maximum amount of delay in seconds (1 day)
$unit = 60;       # increase delay by measures of this unit (1 min)

# provide a closure with the time we last sent a message and 
# the last power of 2 we used to compute the delay interval. 
# The subroutine we create will return a reference to an 
# anonymous array with this information
sub time_closure {
    my($stored_sent,$stored_power)=(0,-1);
    return sub {
       (($stored_sent,$stored_power) = @_) if @_;
       [$stored_sent,$stored_power];
    }
};

$last_data=&time_closure; # create our closure

# return true first time called and then once after an 
# exponential delay
sub expbackoff {
    my($last_sent,$last_power) = @{&$last_data};

    # reply true if this is the first time we've been asked, or if the
    # current delay has elapsed since we last asked. If we return true, 
    # we stash away the time of our last affirmative reply and increase 
    # the power of 2 used to compute the delay.
    if (!$last_sent or
       ($last_sent + 
         (($unit * 2**$last_power >= $max) ? 
             $max : $unit * 2**$last_power) <= time(  ))){
         	       &$last_data(time(  ),++$last_power);
              return 1;
    }
    else {
	   return 0;
    }
}

The subroutine expbackoff( ) returns true (1) if email should be sent and false (0) if not. It begins by returning true the first time it is called, then rapidly increases the delay time until eventually true is only returned once a day.

To make this code more interesting, I've used a peculiar programming construct called a closure to stash away the last message-sent time and the last power of two used to compute the delay. We're using the closure as a way of hiding our important variables from the rest of the program. In this small program it is just a curiosity, but the usefulness of this technique becomes readily apparent in a larger program where it is more likely that other code might inadvertently stomp on our variables. In brief, here's how closures work.

The subroutine &time_closure( ) returns a reference to an anonymous subroutine, essentially a little piece of code without a name. Later on we'll use that reference to run this code using the standard symbolic reference syntax: &$last_data. The code in our anonymous subroutine returns a reference to an array, hence the punctuation parking lot in this line used to access the returned data:

my($last_sent,$last_power) = @{&$last_data};

Here's the magic that makes a closure: because the reference is created in the same enclosing block as the my( )ed variables $stored_sent and $stored_power, it traps those variables in a unique context. $stored_sent and $stored_power can be read and changed only while the code in this reference is executing. They also retain their values between invocations of the code reference. For instance:

# create our closure
$last_data=&time_closure;

# call the subroutine that sets our variables
&$last_data(1,1);         

# attempt to change them outside of the sub
$stored_sent = $stored_power = 2; 

# show their current value using the subroutine
print "@{&$last_data}\n";

will print "1 1" even though it appears we changed the values of $stored_sent and $stored_power in the third line of code. We certainly changed the value of the global variables with those names, but we couldn't touch the copies protected by the closure.

It may help you to think of a variable in a closure as a satellite in orbit around a wandering planet. The satellite is trapped by the gravity of the planet; where the planet goes, so too goes the satellite. The satellite's position can be described only in reference to the planet: to find the satellite, you first locate the planet. Each time you find this particular planet, the satellite should be there, just where you left it. Think of the variables in a closure as being in orbit around their anonymous subroutine code reference, separate from the rest of your program's galaxy.

Setting astrophysics aside, let's return to our discussion of mail sending. Sometimes it is more appropriate to have your program act like a two-year-old, complaining more often as time goes by. Here's some code similar to the previous example. This time we increase the number of messages sent over time. It starts off giving the go-ahead to send mail once a day and then rapidly decreases the delay time until it hits a minimum delay of five minutes:

$max  = 60*60*24; # maximum amount of delay in seconds (1 day)
$min  = 60*5;     # minimum amount of delay in seconds (5 minutes)
$unit = 60;       # decrease delay by measures of this unit (1 min)

$start_power = int log($max/$unit)/log(2); # find the closest power of 2 

sub time_closure {
    my($last_sent,$last_power)=(0,$start_power+1);
    return sub {
      (($last_sent,$last_power) = @_) if @_;
      # keep exponent positive
      $last_power = ($last_power > 0) ? $last_power : 0; 
      [$last_sent,$last_power];
    }
};

$last_data=&time_closure; # create our closure

# return true first time called and then once after an 
# exponential ramp up
sub exprampup {
    my($last_sent,$last_power) = @{&$last_data};

    # reply true if this is the first time we've been asked, or if the
    # current delay has elapsed since we last asked. If we send, we
    # stash away the time of our last affirmative reply and increased
    # power of 2 used to compute the delay.
    if (!$last_sent or
	    ($last_sent + 
         (($unit * 2**$last_power <= $min) ? 
              $min : $unit * 2**$last_power) <= time(  ))){
                 &$last_data(time(  ),--$last_power);
                 return 1;
    }
    else {
       return 0;
    }
}

In both examples we called an additional subroutine (&$last_data) to find when the last message was sent and how the delay was computed. Later, if we decide to change how the program is run, this compartmentalization will allow us to change how we store that state. For example, if we change our program to run periodically rather than running all the time, we could easily replace the closure with a normal subroutine that saves and retrieves the data to and from a plain text file.

8.2.1.2. Controlling the amount of mail

Another subclass of the "overzealous message sending" syndrome is the "everybody on the network for themselves" problem. If all of the machines on your network decide to send you a piece of mail, you may miss something important in the subsequent message blizzard. A better approach is to have them all report to a central repository of some sort. The information can then be collated and mailed out later in a single mail message.

Let's consider a moderately contrived example. For this scenario, assume each machine in your network drops a one-line file into a shared directory.[1] Named for each machine, that file will contain each machine's summary of the results of last night's scientific computation. It would have a single line of this form:

[1]Another good rendezvous spot for status information like this would be in a database.

hostname success-or-failure number-of-computations-completed

A program that collates the information and mails the results might look like this:

use Mail::Mailer;
use Text::Wrap;

# the list of machines reporting in
$repolist = "/project/machinelist"; 
# the directory where they write files
$repodir  = "/project/reportddir";  
# filesystem separator for portability, 
# could use File::Spec module instead 
$separator= "/";                    
# send mail "from" this address
$reportfromaddr  = "project\@example.com"; 
# send mail to this address
$reporttoaddr    = "project\@example.com"; 
# read the list of machine reporting in into a hash. 
# Later we de-populate this hash as each machine reports in, 
# leaving behind only the machine which are missing in action
open(LIST,$repolist) or die "Unable to open list $repolist:$!\n";
while(<LIST>){
    chomp;
    $missing{$_}=1;
    $machines++;
}

# read all of the files in the central report directory
# note: this directory should be cleaned out automatically 
# by another script
opendir(REPO,$repodir) or die "Unable to open dir $repodir:$!\n";

while(defined($statfile=readdir(REPO))){
    next unless -f $repodir.$separator.$statfile;
    
    # open each status file and read in the one-line status report
    open(STAT,$repodir.$separator.$statfile) 
      or die "Unable to open $statfile:$!\n";

    chomp($report = <STAT>);

    ($hostname,$result,$details)=split(' ',$report,3);

    warn "$statfile said it was generated by $hostname!\n"
      if($hostname ne $statfile);

    # hostname is no longer considered missing
    delete $missing{$hostname}; 
    # populate these hashes based on success or failure reported
    if ($result eq "success"){
        $success{$hostname}=$details;
        $succeeded++;
    }
    else {
        $fail{$hostname}=$details;
        $failed++;
    }	
    close(STAT);
}		
closedir(REPO);

# construct a useful subject for our mail message
if ($successes == $machines){
    $subject = "[report] Success: $machines";
}
elsif ($failed == $machines or scalar keys %missing >= $machines) {
    $subject = "[report] Fail: $machines";
}
else {
    $subject = "[report] Partial: $succeeded ACK, $failed NACK".
      ((%missing) ? ", ".scalar keys %missing." MIA" : "");
}

# create the mailer object and populate the headers
$type="sendmail"; 
my $mailer = Mail::Mailer->new($type) or
  die "Unable to create new mailer object:$!\n";

$mailer->open({From=>$reportfromaddr, To=>$reporttoaddr, Subject=>$subject}) or 
  die "Unable to populate mailer object:$!\n";

# create the body of the message
print $mailer "Run report from $0 on " . scalar localtime(time) . "\n";

if (keys %success){
    print $mailer "\n==Succeeded==\n";
    foreach $hostname (sort keys %success){
      print $mailer "$hostname: $success{$hostname}\n";
    }
}

if (keys %fail){
    print $mailer "\n==Failed==\n";
    foreach $hostname (sort keys %fail){
      print $mailer "$hostname: $fail{$hostname}\n";
    }
}

if (keys %missing){
    print $mailer "\n==Missing==\n";
    print $mailer wrap("","",join(" ",sort keys %missing)),"\n";
}

# send the message
$mailer->close;

The code first reads a list of the machine names that will be participating in this scheme. Later on it will use a hash based on this list to check if there are any machines that have not placed a file in the central reporting directory. We open each file in this directory and extract the status information. Once we've collated the results, we construct a mail message and send it out.

Here's an example of the resulting mail:

Date: Wed, 14 Apr 1999 13:06:09 -0400 (EDT)
Message-Id: <199904141706.NAA08780@example.com>
Subject: [report] Partial: 3 ACK, 4 NACK, 1 MIA
To: project@example.com
From: project@example.com

Run report from reportscript on Wed Apr 14 13:06:08 1999

==Succeeded==
barney: computed 23123 oogatrons
betty: computed 6745634 oogatrons
fred: computed 56344 oogatrons

==Failed==
bambam: computed 0 oogatrons
dino: computed 0 oogatrons
pebbles: computed 0 oogatrons
wilma: computed 0 oogatrons

==Missing==
mrslate

Another way to collate results like this is to create a custom logging daemon and have each machine report in over a network socket. Let's look at code for the server first. This example reuses code from the previous example. We'll talk about the important new code right after you see the listing:

use IO::Socket;
use Text::Wrap; # used to make the output prettier

# the list of machine reporting in
$repolist = "/project/machinelist"; 
# the port number clients should connect to 
$serverport = "9967";               

&loadmachines; # load the machine list

# set up our side of the socket
$reserver = IO::Socket::INET->new(LocalPort => $serverport,
                                  Proto     => "tcp",
                                  Type      => SOCK_STREAM,
                                  Listen    => 5,
                                  Reuse     => 1)
  or die "Unable to build our socket half: $!\n";

# start listening on it for connects
while(($connectsock,$connectaddr) = $reserver->accept(  )){

    # the name of the client that has connected to us
    $connectname = gethostbyaddr((sockaddr_in($connectaddr))[1],AF_INET);

    chomp($report=$connectsock->getline);

    ($hostname,$result,$details)=split(' ',$report,3);

    # if we've been told to dump our info, print out a ready-to-go mail
    # message and reinitialize all of our hashes/counters
    if ($hostname eq "DUMPNOW"){
      &printmail($connectsock);
      close($connectsock);
      undef %success;
      undef %fail;
      $succeeded = $failed = 0;
      &loadmachines;
      next;
    }

    warn "$connectname said it was generated by $hostname!\n"
      if($hostname ne $connectname);
    delete $missing{$hostname};
    if ($result eq "success"){
      $success{$hostname}=$details;
      $succeeded++;
    }
    else {
      $fail{$hostname}=$details;
      $failed++;
    }	
    close($connectsock);
}
close($reserver);

# loads the list of machines from the given file
sub loadmachines {
    undef %missing;
    undef $machines; 
    open(LIST,$repolist) or die "Unable to open list $repolist:$!\n";
    while(<LIST>){
      chomp;
      $missing{$_}=1;
      $machines++;
    }
}

# prints a ready to go mail message. The first line is the subject, 
# subsequent lines are all the body of the message
sub printmail{
    ($socket) = $_[0];

    if ($successes == $machines){
      $subject = "[report] Success: $machines";
    }
    elsif ($failed == $machines or scalar keys %missing >= $machines) {
      $subject = "[report] Fail: $machines";
    }
    else {
      $subject = "[report] Partial: $succeeded ACK, $failed NACK".
        ((%missing) ? ", ".scalar keys %missing." MIA" : "");
    }

    print $socket "$subject\n";
    
    print $socket "Run report from $0 on ".scalar localtime(time)."\n";

    if (keys %success){
      print $socket "\n==Succeeded==\n";
      foreach $hostname (sort keys %success){
	    print $socket "$hostname: $success{$hostname}\n";
	   }
    }

    if (keys %fail){
      print $socket "\n==Failed==\n";
      foreach $hostname (sort keys %fail){
	    print $socket "$hostname: $fail{$hostname}\n";
	   }
    }

    if (keys %missing){
      print $socket "\n==Missing==\n";
      print $socket wrap("","",join(" ",sort keys %missing)),"\n";
    }
}

Besides moving some of the code sections to their own subroutines, the key change is the addition of the networking code. The IO::Socket module makes the process of opening and using sockets pretty painless. Sockets are usually described using a telephone metaphor. We start by setting up our side of the socket (IO::Socket->new( )), essentially turning on our phone, and then wait for a call from a network client (IO::Socket->accept( )). Our program will pause (or "block") until a connection comes in. As soon as it arrives, we note the name of the connecting client. We then read a line of input from the socket.

This line of input is expected to look just like those we read from individual files in our previous example. The one difference is the magic hostname DUMPNOW. If we see this hostname, we print the subject and body of a ready-to-mail message to the connecting client and reset all of our counters and hash tables. The client is then responsible for actually sending the mail it receives from the server. Let's look at our sample client and what it can do with this message:

use IO::Socket;

# the port number clients should connect to
$serverport = "9967";
# and the name of the server
$servername = "reportserver";    
# name to IP address
$serveraddr = inet_ntoa(scalar gethostbyname($servername)); 
$reporttoaddr  = "project\@example.com";
$reportfromaddr  = "project\@example.com";

$reserver = IO::Socket::INET->new(PeerAddr => $serveraddr,
                                  PeerPort => $serverport,
                                  Proto    => "tcp",
                                  Type     => SOCK_STREAM)
  or die "Unable to build our socket half: $!\n";


if ($ARGV[0] ne "-m"){
    print $reserver $ARGV[0];
}
else {
    use Mail::Mailer;

    print $reserver "DUMPNOW\n";
    chomp($subject = <$reserver>);
    $body = join("",<$reserver>);

    $type="sendmail";
    my $mailer = Mail::Mailer->new($type) or
      die "Unable to create new mailer object:$!\n";

    $mailer->open({
		   From => $reportfromaddr,
		   To => $reporttoaddr,
		   Subject => $subject
		  }) or
		    die "Unable to populate mailer object:$!\n";

    print $mailer $body;
    $mailer->close;
}

close($reserver);

This code is simpler. First, we open up a socket to the server. In most cases, we pass it our status information (received on the command line as $ARGV[0]) and drop the connection. If we were really going to set up a logging client-server like this, we would probably encapsulate this client code in a subroutine and call it from within a much larger program after its processing had been completed.

If this script is passed an -m flag, it instead sends "DUMPNOW" to the server and reads the subject line and body returned by the server. Then this output is fed to Mail::Mailer and sent out via mail using the same code we've seen earlier.

To limit the example code size and keep the discussion on track, the server and client code presented here is as bare bones as possible. There's no error or input checking, access control or authentication (anyone on the Net who can get to our server can feed and receive data from it), persistent storage (what if the machine goes down?), or any of a number of routine precautions in place. On top of this, we can only handle a single request at a time. If a client should stall in the middle of a transaction, we're sunk. For more sophisticated server examples, I recommend the client-server treatments in Sriram Srinivasan's Advanced Perl Programming, and Tom Christiansen and Nathan Torkington's Perl Cookbook, both published by O'Reilly. Jochen Wiedmann's Net::Daemon module will also help you write more sophisticated daemon programs.

Let's move on to the other common mistakes made when writing system administration programs that send mail.

8.2.2. Subject Line Waste

A Subject: line is a terrible thing to waste. When sending mail automatically, it is possible to generate a useful Subject: line on the fly for each message. This means there is very little excuse to leave someone with a mailbox that looks like this:

Super-User     File history database merge report
Super-User     File history database merge report
Super-User     File history database merge report
Super-User     File history database merge report
Super-User     File history database merge report
Super-User     File history database merge report
Super-User     File history database merge report

when it could look like this:

Super-User     Backup OK, 1 tape, 1.400 GB written.
Super-User     Backup OK, 1 tape, 1.768 GB written.
Super-User     Backup OK, 1 tape, 2.294 GB written.
Super-User     Backup OK, 1 tape, 2.817 GB written.
Super-User     Backup OK, 1 tape, 3.438 GB written. 
Super-User     Backup OK, 3 tapes, 75.40 GB written.

Your Subject: line should be a concise and explicit summary of the situation. It should be very clear from the subject line whether the program generating the message is reporting success, failure, or something in between. A little more programming effort will pay off handsomely in reduced time reading mail.

8.2.3. Insufficient Information in the Message Body

This falls into the same "a little verbosity goes a long way" category as the previous mistake. If your script is going to complain about problems or error conditions in email, there are certain pieces of information it should provide in that mail. They boil down to the canonical questions of journalism:

Who?

Which script is complaining? Include the contents of $0 (if you haven't set it explicitly) to show the full path to the current script. Mention the version of your script if it has one.

Where?

Give some indication of the place in your script where trouble occurred. The Perl function caller( ) returns all sorts of useful information for this:

# note: what caller(  ) returns can be specific to a 
# particular Perl version, be sure to see the perlfunc docs
($package, $filename, $line, $subroutine, $hasargs, $wantarray, 
 $evaltext, $is_require) = caller($frames);

$frames above is the number of stack frames (if you've called subroutines from within subroutines) desired. Most often you'll want $frames set to 1. Here's a sample list returned by the caller( ) function when called in the middle of the server code from our last full code example:

('main','repserver',32,'main::printmail',1,undef)

This shows the script was in the main package while running from the filename repserver at line 32 in the script. At that point it was executing code in the main::printmail subroutine (which has arguments and has not been called in a list context).

If you want to use caller( ) without doing it by hand, the Carp module also provides an excellent problem report.

When?

Describe the program state at the time of the error. For instance, what was the last line of input read?

Why?

If you can, answer the reader's unspoken question: "Why are you bothering me with a mail message?" The answer may be as simple as "the accounting data has not been fully collated," "DNS service is not available now," or "the machine room is on fire." This provides context to the reader (and perhaps some motivation to investigate).

What?

Finally, don't forget to mention what went wrong in the first place.

Here's some simple Perl code that covers all of these bases:

use Text::Wrap;

sub problemreport {
# $shortcontext should be a one-line description of the problem
# $usercontext should be a detailed description of the problem
# $nextstep should be the best suggestion for how to remedy the problem 
    my($shortcontext,$usercontext,$nextstep) = @_;
    my($filename, $line, $subroutine) = (caller(1))[1,2,3];
    
    push(@return,"Problem with $filename: $shortcontext\n");

    push(@return,"*** Problem report for $filename ***\n\n");
    push(@return,fill("","","- Problem: $usercontext")."\n\n");
    push(@return,"- Location: line $line of file $filename in
                 $subroutine\n\n");
    push(@return,"- Occurred: ".scalar localtime(time)."\n\n");

    push(@return,"- Next step: $nextstep\n");

    \@return;
}
    
sub fireperson {
    $report = &problemreport("the computer is on fire",<<EOR,<<EON);
While running the accounting report, smoke started pouring out of the 
back of the machine. This occurred right after we processed the ORA 
pension plan.
EOR
Please put fire out before continuing.
EON

  print @{$report};

}

&fireperson;

&problemreport will output a problem report, subject line first, suitable for feeding to Mail::Mailer as per our previous examples. &fireperson is an example test of this subroutine.

Now that we've explored sending mail, let's see the other edge of the sword.