Processes as Filehandles (Learning Perl, 3rd Edition)

14.5. Processes as Filehandles

So far, we've been looking at ways to deal with synchronous processes, where Perl stays in charge, launches a command, (usually) waits for it to finish, then possibly grabs its output. But Perl can also launch a child process that stays alive, communicating [322] to Perl on an ongoing basis until the task is complete.

[322]Via pipes, or whatever your operating system provides for simple interprocess communication.

The syntax for launching a concurrent (parallel) child process is to put the command as the "filename" for an open call, and either precede the command or follow the command with a vertical bar, which is the "pipe" character. For that reason, this is often called a piped open:

open DATE, "date|" or die "cannot pipe from date: $!";
open MAIL, "|mail merlyn" or die "cannot pipe to mail: $!";

In the first example, with the vertical bar on the right, the command is launched with its standard output connected to the DATE filehandle opened for reading, similar to the way that the command date | your_program would work from the shell. In the second example, with the vertical bar on the left, the command's standard input is connected to the MAIL filehandle opened for writing, similar to what happens with the command your_program | mail merlyn. In either case, the command is now launched and continues independently of the Perl process.[323]

[323]If the Perl process exits before the command is complete, a command that's been reading will see end-of-file, while a command that's been writing will get a "broken pipe" error signal on the next write, by default.

The open fails if the child process cannot be created. If the command itself does not exist or exits erroneously, this will (generally) not be seen as an error when opening, but as an error when closing. We'll get to that in a moment.

For all intents and purposes, the rest of the program doesn't know, doesn't care, and would have to work pretty hard to figure out that this is a filehandle opened on a process rather than on a file. So, to get data from a filehandle opened for reading, we'll just do the normal read:

my $now = <DATE>;

And to send data to the mail process (waiting for the body of a message to deliver to merlyn on standard input), a simple print-with-a-filehandle will do:

print MAIL "The time is now $now"; # presume $now ends in newline

In short, you can pretend that these filehandles are hooked up to magical files, one that contains the output of the date command, and one that will automatically be mailed by the mail command.

If a process is connected to a filehandle that is open for reading, and then exits, the filehandle returns end-of-file, just like reading up to the end of a normal file. When you close a filehandle open for writing to a process, the process will see end-of-file. So, to finish sending the email, close the handle:

close MAIL;
die "mail: non-zero exit of $?" if $?;

Closing a filehandle attached to a process waits for the process to complete, so that Perl can get the process's exit status. The exit status is then available in the $? variable (reminiscent of the same variable in the Bourne Shell), and is the same kind of number as the value returned by the system function: zero for success, nonzero for failure. Each new exited process overwrites the previous value though, so save it quickly if you want it. (The $? variable also holds the exit status of the most recent system or backquoted command, if you're curious.)

The processes are synchronized just like a pipelined command. If you try to read and no data is available, the process is suspended (without consuming additional CPU time) until the sending program has started speaking again. Similarly, if a writing process gets ahead of the reading process, the writing process is slowed down until the reader starts to catch up. There's a buffer (usually 4K bytes or so) in between so they don't have to stay precisely in lock step.

Why use processes as filehandles? Well, it's the only easy way to write to a process based on the results of a computation. But if you're just reading, backquotes are often much easier to manage, unless you want to have the results as they come in.

For example, the Unix find command locates files based on their attributes, and it can take quite a while if used on a fairly large number of files (such as starting from the root directory). You can put a find command inside backquotes, but it's often nicer to see the results as they are found:

open F, "find / -atime +90 -size +1000 -print|" or die "fork: $!";
while (<F>) {
  chomp;
  printf "%s size %dK last accessed on %s\n",
    $_, (1023 + -s $_)/1024, -A $_;
}

The find command here is looking for all the files that were not accessed within the past 90 days and that are larger than 1000 blocks. (These are good candidates to be moved off to longer-term storage.) While find is searching and searching, Perl can wait. As each file is found, Perl responds to the incoming name and displays some information about that file for further research. Had this been written with backquotes, we'd not see any output until the find commmand had finished, and it's comforting to see that it's actually doing the job even before it's done.