Chapter 16. Interprocess Communication

Computer processes have almost as many ways of communicating as people do. The difficulties of interprocess communication should not be underestimated. It doesn't do you any good to listen for verbal cues when your friend is using only body language. Likewise, two processes can communicate only when they agree on the means of communication, and on the conventions built on top of that. As with any kind of communication, the conventions to be agreed upon range from lexical to pragmatic: everything from which lingo you'll use, up to whose turn it is to talk. These conventions are necessary because it's very difficult to communicate bare semantics in the absence of context.

In our lingo, interprocess communication is usually pronounced IPC. The IPC facilities of Perl range from the very simple to the very complex. Which facility you should use depends on the complexity of the information to be communicated. The simplest kind of information is almost no information at all: just the awareness that a particular event has happened at a particular point in time. In Perl, these events are communicated via a signal mechanism modeled on the Unix signal system.

At the other extreme, the socket facilities of Perl allow you to communicate with any other process on the Internet using any mutually supported protocol you like. Naturally, this freedom comes at a price: you have to go through a number of steps to set up the connections and make sure you're talking the same language as the process on the other end. This may in turn require you to adhere to any number of other strange customs, depending on local conventions. To be protocoligorically correct, you might even be required to speak a language like XML, or Java, or Perl. Horrors.

Sandwiched in between are some facilities intended primarily for communicating with processes on the same machine. These include good old-fashioned files, pipes, FIFOs, and the various System V IPC syscalls. Support for these facilities varies across platforms; modern Unix systems (including Apple's Mac OS X) should support all of them, and, except for signals and SysV IPC, most of the rest are supported on any recent Microsoft operating systems, including pipes, forking, file locking, and sockets.[1]

[1] Well, except for AF_UNIX sockets.

More information about porting in general can be found in the standard Perl documentation set (in whatever format your system displays it) under perlport. Microsoft-specific information can be found under perlwin32 and perlfork, which are installed even on non-Microsoft systems. For textbooks, we suggest the following:

The Perl Cookbook, by Tom Christiansen and Nathan Torkington (O'Reilly and Associates, 1998), chapters 16 through 18.
Advanced Programming in the UNIX Environment, by W. Richard Stevens (Addison-Wesley, 1992).
TCP/IP Illustrated, by W. Richard Stevens, Volumes I-III (Addison-Wesley, 1992-1996).

16.1. Signals

Perl uses a simple signal-handling model: the %SIG hash contains references (either symbolic or hard) to user-defined signal handlers. Certain events cause the operating system to deliver a signal to the affected process. The handler corresponding to that event is called with one argument containing the name of the signal that triggered it. To send a signal to another process, you use the kill function. Think of it as sending a one-bit piece of information to the other process.[2] If that process has installed a signal handler for that signal, it can execute code when it receives the signal. But there's no way for the sending process to get any sort of return value, other than knowing that the signal was legally sent. The sender receives no feedback saying what, if anything, the receiving process did with the signal.

[2] Actually, it's more like five or six bits, depending on how many signals your OS defines and on whether the other process makes use of the fact that you didn't send a different signal.

We've classified this facility as a form of IPC, but in fact, signals can come from various sources, not just other processes. A signal might also come from your own process, or it might be generated when the user at the keyboard types a particular sequence like Control-C or Control-Z, or it might be manufactured by the kernel when a special event transpires, such as when a child process exits, or when your process runs out of stack space or hits a file size or memory limit. But your own process can't easily distinguish among these cases. A signal is like a package that arrives mysteriously on your doorstep with no return address. You'd best open it carefully.

Since entries in the %SIG array can be hard references, it's common practice to use anonymous functions for simple signal handlers:

$SIG{INT}  = sub { die "\nOutta here!\n" };
$SIG{ALRM} = sub { die "Your alarm clock went off" };

Or you could create a named function and assign its name or reference to the appropriate slot in the hash. For example, to intercept interrupt and quit signals (often bound to Control-C and Control-\ on your keyboard), set up a handler like this:

sub catch_zap {
    my $signame = shift;
    our $shucks++;
    die "Somebody sent me a SIG$signame!";
} 
$shucks = 0;
$SIG{INT} = 'catch_zap';   # always means &main::catch_zap
$SIG{INT} = \&catch_zap;   # best strategy
$SIG{QUIT} = \&catch_zap;  # catch another, too

Notice how all we do in the signal handler is set a global variable and then raise an exception with die. Whenever possible, try to avoid anything more complicated than that, because on most systems the C library is not re-entrant. Signals are delivered asynchronously,[3] so calling any print functions (or even anything that needs to malloc(3) more memory) could in theory trigger a memory fault and subsequent core dump if you were already in a related C library routine when the signal was delivered. (Even the die routine is a bit unsafe unless the process is executing within an eval, which suppresses the I/O from die, which keeps it from calling the C library. Probably.)

[3]Synchronizing signal delivery with Perl-level opcodes is scheduled for a future release of Perl, which should solve the matter of signals and core dumps.

An even easier way to trap signals is to use the sigtrap pragma to install simple, default signal handlers:

use sigtrap qw(die INT QUIT);
use sigtrap qw(die untrapped normal-signals 
   stack-trace any error-signals);

The pragma is useful when you don't want to bother writing your own handler, but you still want to catch dangerous signals and perform an orderly shutdown. By default, some of these signals are so fatal to your process that your program will just stop in its tracks when it receives one. Unfortunately, that means that any END functions for at-exit handling and DESTROY methods for object finalization are not called. But they are called on ordinary Perl exceptions (such as when you call die), so you can use this pragma to painlessly convert the signals into exceptions. Even though you aren't dealing with the signals yourself, your program still behaves correctly. See the description of use sigtrap in Chapter 31, "Pragmatic Modules", for many more features of this pragma.

You may also set the %SIG handler to either of the strings "IGNORE" or "DEFAULT", in which case Perl will try to discard the signal or allow the default action for that signal to occur (though some signals can be neither trapped nor ignored, such as the KILL and STOP signals; see signal(3), if you have it, for a list of signals available on your system and their default behaviors).

The operating system thinks of signals as numbers rather than names, but Perl, like most people, prefers symbolic names to magic numbers. To find the names of the signals, list out the keys of the %SIG hash, or use the kill -l command if you have one on your system. You can also use Perl's standard Config module to determine your operating system's mapping between signal names and signal numbers. See Config(3) for an example of this.

Because %SIG is a global hash, assignments to it affect your entire program. It's often more considerate to the rest of your program to confine your signal catching to a restricted scope. Do this with a local signal handler assignment, which goes out of effect once the enclosing block is exited. (But remember that local values are visible in functions called from within that block.)

{  
    local $SIG{INT} = 'IGNORE'; 
    ...     # Do whatever you want here, ignoring all SIGINTs.
    fn();   # SIGINTs ignored inside fn() too!
    ...     # And here.
}           # Block exit restores previous $SIG{INT} value.

fn();       # SIGINTs not ignored inside fn() (presumably).

16.1.1. Signaling Process Groups

Processes (under Unix, at least) are organized into process groups, generally corresponding to an entire job. For example, when you fire off a single shell command that consists of a series of filter commands that pipe data from one to the other, those processes (and their child processes) all belong to the same process group. That process group has a number corresponding to the process number of the process group leader. If you send a signal to a positive process number, it just sends the signal to the process, but if you send a signal to a negative number, it sends that signal to every process whose process group number is the corresponding positive number, that is, the process number of the process group leader. (Conveniently for the process group leader, the process group ID is just $$.)

Suppose your program wants to send a hang-up signal to all child processes it started directly, plus any grandchildren started by those children, plus any greatgrandchildren started by those grandchildren, and so on. To do this, your program first calls setpgrp(0,0) to become the leader of a new process group, and any processes it creates will be part of the new group. It doesn't matter whether these processes were started manually via fork, automaticaly via piped opens, or as backgrounded jobs with system("cmd &"). Even if those processes had children of their own, sending a hang-up signal to your entire process group will find them all (except for processes that have set their own process group or changed their UID to give themselves diplomatic immunity to your signals).

{
    local $SIG{HUP} = 'IGNORE';   # exempt myself
    kill(HUP, -$$);               # signal my own process group
}

Another interesting signal is signal number 0. This doesn't actually affect the target process, but instead checks that it's alive and hasn't changed its UID. That is, it checks whether it's legal to send a signal, without actually sending one.

unless (kill 0 => $kid_pid) {
    warn "something wicked happened to $kid_pid";
}

Signal number 0 is the only signal that works the same under Microsoft ports of Perl as it does in Unix. On Microsoft systems, kill does not actually deliver a signal. Instead, it forces the target process to exit with the status indicated by the signal number. This may be fixed someday. The magic 0 signal, however, still behaves in the standard, nondestructive fashion.

16.1.2. Reaping Zombies

When a process exits, its parent is sent a CHLD signal by the kernel and the process becomes a zombie[4] until the parent calls wait or waitpid. If you start another process in Perl using anything except fork, Perl takes care of reaping your zombied children, but if you use a raw fork, you're expected to clean up after yourself. On many but not all kernels, a simple hack for autoreaping zombies is to set $SIG{CHLD} to 'IGNORE'. A more flexible (but tedious) approach is to reap them yourself. Because more than one child may have died before you get around to dealing with them, you must gather your zombies in a loop until there aren't any more:

use POSIX ":sys_wait_h";
sub REAPER { 1 until waitpid(-1, WNOHANG) == -1) }

To run this code as needed, you can either set a CHLD signal handler for it:

$SIG{CHLD} = \&REAPER;

or, if you're running in a loop, just arrange to call the reaper every so often. This is the best approach because it isn't subject to the occasional core dump that signals can sometimes trigger in the C library. However, it's expensive if called in a tight loop, so a reasonable compromise is to use a hybrid strategy where you minimize the risk within the handler by doing as little as possible and waiting until outside to reap zombies:

our $zombies = 0;
$SIG{CHLD} = sub { $zombies++ };  
sub reaper {
    my $zombie;
    our %Kid_Status;  # store each exit status
    $zombies = 0;  
    while (($zombie = waitpid(-1, WNOHANG)) != -1) {
        $Kid_Status{$zombie} = $?;
    } 
}
while (1) {
    reaper() if $zombies;
    ...
}

This code assumes your kernel supports reliable signals. Old SysV traditionally didn't, which made it impossible to write correct signal handlers there. Ever since way back in the 5.003 release, Perl has used the sigaction(2) syscall where available, which is a lot more dependable. This means that unless you're running on an ancient operating system or with an ancient Perl, you won't have to reinstall your handlers and risk missing signals. Fortunately, all BSD-flavored systems (including Linux, Solaris, and Mac OS X) plus all POSIX-compliant systems provide reliable signals, so the old broken SysV behavior is more a matter of historical note than of current concern.

[4]Yes, that really is the technical term.

With these newer kernels, many other things will work better, too. For example, "slow" syscalls (those that can block, like read, wait, and accept) will restart automatically if interrupted by a signal. In the bad old days, user code had to remember to check explicitly whether each slow syscall failed with $! ($ERRNO) set to EINTR and, if so, restart. This wouldn't happen just from INT signals; even innocuous signals like TSTP (from a Control-Z) or CONT (from foregrounding the job) would abort the syscall. Perl now restarts the syscall for you automatically if the operating system allows it to. This is generally construed to be a feature.

You can check whether you have the more rigorous POSIX-style signal behavior by loading the Config module and checking whether $Config{d_sigaction} has a true value. To find out whether slow syscalls are restartable, check your system documentation on sigaction(2) or sigvec(3), or scrounge around your C sys/signal.h file for SV_INTERRUPT or SA_RESTART. If one or both symbols are found, you probably have restartable syscalls.

16.1.3. Timing Out Slow Operations

A common use for signals is to impose time limits on long-running operations. If you're on a Unix system (or any other POSIX-conforming system that supports the ALRM signal), you can ask the kernel to send your process an ALRM at some point in the future:

use Fcntl ':flock';
eval { 
    local $SIG{ALRM} = sub { die "alarm clock restart" };
    alarm 10;               # schedule alarm in 10 seconds 
    eval { 
        flock(FH, LOCK_EX)  # a blocking, exclusive lock
            or die "can't flock: $!";
    };
    alarm 0;                # cancel the alarm
};
alarm 0;               # race condition protection
die if $@ && $@ !~ /alarm clock restart/; # reraise

If the alarm hits while you're waiting for the lock, and you simply catch the signal and return, you'll go right back into the flock because Perl automatically restarts syscalls where it can. The only way out is to raise an exception through die and then let eval catch it. (This works because the exception winds up calling the C library's longjmp(3) function, which is what really gets you out of the restarting syscall.)

The nested exception trap is included because calling flock would raise an exception if flock is not implemented on your platform, and you need to make sure to clear the alarm anyway. The second alarm 0 is provided in case the signal comes in after running the flock but before getting to the first alarm 0. Without the second alarm, you would risk a tiny race condition--but size doesn't matter in race conditions; they either exist or they don't. And we prefer that they don't.

16.1.4. Blocking Signals

Now and then, you'd like to delay receipt of a signal during some critical section of code. You don't want to blindly ignore the signal, but what you're doing is too important to interrupt. Perl's %SIG hash doesn't implement signal blocking, but the POSIX module does, through its interface to the sigprocmask(2) syscall:

use POSIX qw(:signal_h);
$sigset   = POSIX::SigSet->new; 
$blockset = POSIX::SigSet->new(SIGINT, SIGQUIT, SIGCHLD);       
sigprocmask(SIG_BLOCK, $blockset, $sigset) 
    or die "Could not block INT,QUIT,CHLD signals: $!\n";

Once the three signals are all blocked, you can do whatever you want without fear of being bothered. When you're done with your critical section, unblock the signals by restoring the old signal mask:

sigprocmask(SIG_SETMASK, $sigset)
    or die "Could not restore INT,QUIT,CHLD signals: $!\n";

If any of the three signals came in while blocked, they are delivered immediately. If two or more different signals are pending, the order of delivery is not defined. Additionally, no distinction is made between having received a particular signal once while blocked and having received it many times.[5] For example, if nine child processes exited while you were blocking CHLD signals, your handler (if you had one) would still be called only once after you unblocked. That's why, when you reap zombies, you should always loop until they're all gone.

[5] Traditionally, that is. Countable signals may be implemented on some real-time systems according to the latest specs, but we haven't seen these yet.