16. Process Management and Communication

It is quite a three-pipe problem, and I beg that you won't speak to me for fifty minutes.

- Sherlock Holmes The Red-Headed League

16.0. Introduction

Perl may be many things to many people, but to most of us it is the glue that connects diverse components. This chapter is about launching commands and connecting separate processes together. It's about managing their creation, communication, and ultimate demise. It's about systems programming.

When it comes to systems programming, Perl, as usual, makes easy things easy and hard things possible. If you want to use it as you would the shell, Perl is happy to assist you. If you want to roll up your sleeves for low-level hacking like a hardcore C programmer, you can do that, too.

Because Perl lets you get so close to the system, portability issues can sneak in. This chapter is the most Unix-centric chapter of the book. It will be tremendously useful to those on Unix systems, but only of limited use to others. We deal with features that aren't as universal as strings and numbers and basic arithmetic. Most basic operations work more or less the same everywhere. But if you're not using some kind of Unix or other POSIX conformant system, most of the interesting features in this chapter may work differently for you - or not at all. Check the documentation that came with your Perl port if you aren't sure.

Process Creation

In this chapter, we cover the proper care and feeding of your own child processes. Sometimes this means launching a stand-alone command and letting it have its own way with the world (using system ). Other times it means keeping a tight rein on your child, feeding it carefully filtered input or taking hold of its output stream (backticks and piped open s). Without even starting a new process, you can use exec to replace your current program with a completely different program.

We first show how to use the most portable and commonly used operations for managing processes: backticks, system , open , and the manipulation of the %SIG hash. Those are the easy things, but we don't stop there. We also show what to do when the simple approach isn't good enough.

For example, you might want to interrupt your program while it's running a different program. Maybe you need to process a child program's standard error separately from its standard output. Perhaps you need to control both the input and output of a program simultaneously. When you tire of having just one thread of control and begin to take advantage of multitasking, you'll want to learn how to split your current program into several, simultaneously running processes that all talk to each other.

For tasks like these, you have to drop back to the underlying system calls: pipe , fork , and exec . The pipe function creates two connected filehandles, a reader and writer, whereby anything written to the writer can be read from the reader. The fork function is the basis of multitasking, but unfortunately it has not been supported on all non-Unix systems. It clones off a duplicate process identical in virtually every aspect to its parent, including variable settings and open files. The most noticable changes are the process ID and parent process ID. New programs are started by forking, then using exec to replace the program in the child process with a new one. You don't always both fork and exec together, so having them as separate primitives is more expressive and powerful than if all you could do is run system . In practice, you're more apt to use fork by itself than exec by itself.

When a child process dies, its memory is returned to the operating system, but its entry in the process table isn't freed. This lets a parent check the exit status of its child processes. Processes that have died but haven't been removed from the process table are called zombie s, and you should clean them up lest they fill the whole process table. Backticks and the system and open functions automatically take care of this, and will work on most non-Unix systems. You have more to worry about when you go beyond these simple portable functions and use low-level primitives to launch programs. One thing to worry about is signals.

Signals

Your process is notified of the death of a child it created with a signal . Signals are a kind of notification delivered by the operating system. They are used for errors (when the kernel says, "Hey, you can't touch that area of memory!") and for events (death of a child, expiration of a per-process timer, interrupt with Ctrl-C). If you're launching processes manually, you normally arrange for a subroutine of your choosing to be called whenever one of your children exits.

Each process has a default disposition for each possible signal. You may install your own handler or otherwise change the disposition of most signals. Only SIGKILL and SIGSTOP cannot be changed. The rest you can ignore, trap, or block.

Briefly, here's a rundown of the more important signals:

SIGINT: is normally triggered by Ctrl-C. This requests that a process interrupt what it's doing. Simple programs like filters usually just die, but more important ones like shells, editors, or FTP programs usually use SIGINT to stop long-running operations so you can tell them to do something else.
SIGQUIT: is also normally generated by a terminal, usually Ctrl-\. Its default behavior is to generate a core dump.
SIGTERM: is sent by the kill shell command when no signal name is explicitly given. Think of it as a polite request for a process to die.
SIGUSR1 and SIGUSR2: are never caused by system events, so user applications can safely use them for their own purposes.
SIGPIPE: is sent by the kernel when your process tries to write to a pipe or socket when the process on the other end has closed its connection, usually because it no longer exists.
SIGALRM: is sent when the timer set by the alarm function expires, as described in Recipe 16.21 .
SIGHUP: is sent to a process when its controlling terminal gets a hang-up (e.g., the modem lost its carrier), but it also often indicates that a program should restart or reread its configuration.
SIGCHLD: is probably the most important signal when it comes to low-level systems programming. The system sends your process a SIGCHLD when one of its child processes stops running - or, more likely, when that child exits. See Recipe 16.19 for more on SIGCHLD.

Signal names are a convenience for humans. Each signal has an associated number that the operating system uses instead of names. Although we talk about SIGCHLD, your operating system only knows it as a number, like 20 (these numbers vary across operating systems). Perl translates between signal names and numbers for you, so you can think in terms of signal names.

Recipes Recipe 16.15 , Recipe 16.17 , Recipe 16.21 , Recipe 16.18 , and Recipe 16.20 run the full gamut of signal handling.