Pipes (Programming Perl)

16.3. Pipes

A pipe is a unidirectional I/O channel that can transfer a stream of bytes from one process to another. Pipes come in both named and nameless varieties. You may be more familiar with nameless pipes, so we'll talk about those first.

16.3.1. Anonymous Pipes

Perl's open function opens a pipe instead of a file when you append or prepend a pipe symbol to the second argument to open. This turns the rest of the arguments into a command, which will be interpreted as a process (or set of processes) that you want to pipe a stream of data either into or out of. Here's how to start up a child process that you intend to write to:

open SPOOLER, "| cat -v | lpr -h 2>/dev/null"
    or die "can't fork: $!";
local $SIG{PIPE} = sub { die "spooler pipe broke" };
print SPOOLER "stuff\n";
close SPOOLER or die "bad spool: $! $?";

This example actually starts up two processes, the first of which (running cat) we print to directly. The second process (running lpr) then receives the output of the first process. In shell programming, this is often called a pipeline. A pipeline can have as many processes in a row as you like, as long as the ones in the middle know how to behave like filters; that is, they read standard input and write standard output.

Perl uses your default system shell (/bin/sh on Unix) whenever a pipe command contains special characters that the shell cares about. If you're only starting one command, and you don't need--or don't want--to use the shell, you can use the multi-argument form of a piped open instead:

open SPOOLER, "|-", "lpr", "-h"    # requires 5.6.1
    or die "can't run lpr: $!";

If you reopen your program's standard output as a pipe to another program, anything you subsequently print to STDOUT will be standard input for the new program. So to page your program's output,[8] you'd use:

if (-t STDOUT) {             # only if stdout is a terminal
    my $pager = $ENV{PAGER} || 'more';  
    open(STDOUT, "| $pager")    or die "can't fork a pager: $!";
}
END { 
    close(STDOUT)               or die "can't close STDOUT: $!" 
}

When you're writing to a filehandle connected to a pipe, always explicitly close that handle when you're done with it. That way your main program doesn't exit before its offspring.

[8]That is, let them view it one screenful at a time, not set off random bird calls.

Here's how to start up a child process that you intend to read from:

open STATUS, "netstat -an 2>/dev/null |"
    or die "can't fork: $!";
while (<STATUS>) {
    next if /^(tcp|udp)/;
    print;
} 
close STATUS or die "bad netstat: $! $?";

You can open a multistage pipeline for input just as you can for output. And as before, you can avoid the shell by using an alternate form of open:

open STATUS, "-|", "netstat", "-an"      # requires 5.6.1
    or die "can't run netstat: $!";

But then you don't get I/O redirection, wildcard expansion, or multistage pipes, since Perl relies on your shell to do those.

You might have noticed that you can use backticks to accomplish the same effect as opening a pipe for reading:

print grep { !/^(tcp|udp)/ } `netstat -an 2>&1`;
die "bad netstat" if $?;

While backticks are extremely handy, they have to read the whole thing into memory at once, so it's often more efficient to open your own piped filehandle and process the file one line or record at a time. This gives you finer control over the whole operation, letting you kill off the child process early if you like. You can also be more efficient by processing the input as it's coming in, since computers can interleave various operations when two or more processes are running at the same time. (Even on a single-CPU machine, input and output operations can happen while the CPU is doing something else.)

Because you're running two or more processes concurrently, disaster can strike the child process any time between the open and the close. This means that the parent must check the return values of both open and close. Checking the open isn't good enough, since that will only tell you whether the fork was successful, and possibly whether the subsequent command was successfully launched. (It can tell you this only in recent versions of Perl, and only if the command is executed directly by the forked child, not via the shell.) Any disaster that happens after that is reported from the child to the parent as a nonzero exit status. When the close function sees that, it knows to return a false value, indicating that the actual status value should be read from the $? ($CHILD_ERROR) variable. So checking the return value of close is just as important as checking open. If you're writing to a pipe, you should also be prepared to handle the PIPE signal, which is sent to you if the process on the other end dies before you're done sending to it.

16.3.2. Talking to Yourself

Another approach to IPC is to make your program talk to itself, in a manner of speaking. Actually, your process talks over pipes to a forked copy of itself. It works much like the piped open we talked about in the last section, except that the child process continues executing your script instead of some other command.

To represent this to the open function, you use a pseudocommand consisting of a minus. So the second argument to open looks like either "-|" or "|-", depending on whether you want to pipe from yourself or to yourself. As with an ordinary fork command, the open function returns the child's process ID in the parent process but 0 in the child process. Another asymmetry is that the filehandle named by the open is used only in the parent process. The child's end of the pipe is hooked to either STDIN or STDOUT as appropriate. That is, if you open a pipe to minus with |-, you can write to the filehandle you opened, and your kid will find this in STDIN:

if (open(TO, "|-")) {
    print TO $fromparent;
}
else {
    $tochild = <STDIN>;
    exit;
}

If you open a pipe from minus with -|, you can read from the filehandle you opened, which will return whatever your kid writes to STDOUT:

if (open(FROM, "-|")) {
    $toparent = <FROM>;
}
else {
    print STDOUT $fromchild;
    exit;
}

One common application of this construct is to bypass the shell when you want to open a pipe from a command. You might want to do this because you don't want the shell to interpret any possible metacharacters in the filenames you're trying to pass to the command. If you're running release 5.6.1 or greater of Perl, you can use the multi-argument form of open to get the same result.

Another use of a forking open is to safely open a file or command even while you're running under an assumed UID or GID. The child you fork drops any special access rights, then safely opens the file or command and acts as an intermediary, passing data between its more powerful parent and the file or command it opened. Examples can be found in the section Section 16.1.3, "Accessing Commands and Files Under Reduced Privileges" in Chapter 23, "Security".

One creative use of a forking open is to filter your own output. Some algorithms are much easier to implement in two separate passes than they are in just one pass. Here's a simple example in which we emulate the Unix tee(1) program by sending our normal output down a pipe. The agent on the other end of the pipe (one of our own subroutines) distributes our output to all the files specified:

tee("/tmp/foo", "/tmp/bar", "/tmp/glarch");

while (<>) {
    print "$ARGV at line $. => $_";
}
close(STDOUT)  or die "can't close STDOUT: $!";
    
sub tee {
    my @output = @_;
    my @handles = ();
    for my $path (@output) {
        my $fh;  # open will fill this in
        unless (open ($fh, ">", $path)) {
            warn "cannot write to $path: $!";
            next;
        }
        push @handles, $fh;
    }
    
    # reopen STDOUT in parent and return
    return if my $pid = open(STDOUT, "|-");
    die "cannot fork: $!" unless defined $pid;
    
    # process STDIN in child
    while (<STDIN>) {
        for my $fh (@handles) {
            print $fh $_ or die "tee output failed: $!";
        }
    }
    for my $fh (@handles) {
        close($fh) or die "tee closing failed: $!";
    }
    exit;  # don't let the child return to main!
}

This technique can be applied repeatedly to push as many filters on your output stream as you wish. Just keep calling functions that fork-open STDOUT, and have the child read from its parent (which it sees as STDIN) and pass the massaged output along to the next function in the stream.

Another interesting application of talking to yourself with fork-open is to capture the output from an ill-mannered function that always splats its results to STDOUT. Imagine if Perl only had printf and no sprintf. What you'd need would be something that worked like backticks, but with Perl functions instead of external commands:

badfunc("arg");                       # drat, escaped!
$string = forksub(\&badfunc, "arg");  # caught it as string
@lines  = forksub(\&badfunc, "arg");  # as separate lines

sub forksub {
    my $kidpid = open my $self, "-|";
    defined $kidpid         or die "cannot fork: $!";
    shift->(@_), exit       unless $kidpid;
    local $/                unless wantarray;
    return <$self>;         # closes on scope exit
}

We're not claiming this is efficient; a tied filehandle would probably be a good bit faster. But it's a lot easier to code up if you're in more of a hurry than your computer is.

16.3.3. Bidirectional Communication

Although using open to connect to another command over a pipe works reasonably well for unidirectional communication, what about bidirectional communication? The obvious approach doesn't actually work:

open(PROG_TO_READ_AND_WRITE, "| some program |")  # WRONG!

and if you forget to enable warnings, then you'll miss out entirely on the diagnostic message:

Can't do bidirectional pipe at myprog line 3.

The open function doesn't allow this because it's rather prone to deadlock unless you're quite careful. But if you're determined, you can use the standard IPC::Open2 library module to attach two pipes to a subprocess's STDIN and STDOUT. There's also an IPC::Open3 module for tridirectional I/O (allowing you to also catch your child's STDERR), but this requires either an awkward select loop or the somewhat more convenient IO::Select module. But then you'll have to avoid Perl's buffered input operations like <> (readline).

Here's an example using open2:

use IPC::Open2;
local (*Reader, *Writer);
$pid = open2(\*Reader, \*Writer, "bc -l");
$sum = 2;
for (1 .. 5) {
    print Writer "$sum * $sum\n";
    chomp($sum = <Reader>);
}
close Writer;
close Reader;
waitpid($pid, 0);
print "sum is $sum\n";

You can also autovivify lexical filehandles:

my ($fhread, $fhwrite);
$pid = open2($fhread, $fhwrite, "cat -u -n");

The problem with this in general is that standard I/O buffering is really going to ruin your day. Even though your output filehandle is autoflushed (the library does this for you) so that the process on the other end will get your data in a timely manner, you can't usually do anything to force it to return the favor. In this particular case, we were lucky: bc expects to operate over a pipe and knows to flush each output line. But few commands are so designed, so this seldom works out unless you yourself wrote the program on the other end of the double-ended pipe. Even simple, apparently interactive programs like ftp fail here because they won't do line buffering on a pipe. They'll only do it on a tty device.

The IO::Pty and Expect modules from CPAN can help with this because they provide a real tty (actually, a real pseudo-tty, but it acts like a real one). This gets you line buffering in the other process without modifying its program.

If you split your program into several processes and want these to all have a conversation that goes both ways, you can't use Perl's high-level pipe interfaces, because these are all unidirectional. You'll need to use two low-level pipe function calls, each handling one direction of the conversation:

pipe(FROM_PARENT, TO_CHILD)     or die "pipe: $!";
pipe(FROM_CHILD,  TO_PARENT)    or die "pipe: $!";
select((select(TO_CHILD), $| = 1))[0]);   # autoflush
select((select(TO_PARENT), $| = 1))[0]);  # autoflush

if ($pid = fork) {
    close FROM_PARENT; close TO_PARENT;
    print TO_CHILD "Parent Pid $$ is sending this\n";
    chomp($line = <FROM_CHILD>);
    print "Parent Pid $$ just read this: `$line'\n";
    close FROM_CHILD; close TO_CHILD;
    waitpid($pid,0);
} else {
    die "cannot fork: $!" unless defined $pid;
    close FROM_CHILD; close TO_CHILD;
    chomp($line = <FROM_PARENT>);
    print "Child Pid $$ just read this: `$line'\n";
    print TO_PARENT "Child Pid $$ is sending this\n";
    close FROM_PARENT; close TO_PARENT;
    exit;
}

On many Unix systems, you don't actually have to make two separate pipe calls to achieve full duplex communication between parent and child. The socketpair syscall provides bidirectional connections between related processes on the same machine. So instead of two pipes, you only need one socketpair.

use Socket;     
socketpair(Child, Parent, AF_UNIX, SOCK_STREAM, PF_UNSPEC)
    or die "socketpair: $!";

# or letting perl pick filehandles for you
my ($kidfh, $dadfh);
socketpair($kidfh, $dadfh, AF_UNIX, SOCK_STREAM, PF_UNSPEC)
    or die "socketpair: $!";

After the fork, the parent closes the Parent handle, then reads and writes via the Child handle. Meanwhile, the child closes the Child handle, then reads and writes via the Parent handle.

If you're looking into bidirectional communications because the process you'd like to talk to implements a standard Internet service, you should usually just skip the middleman and use a CPAN module designed for that exact purpose. (See the section Section 16.5, "Sockets" later for a list of a some of these.)

16.3.4. Named Pipes

A named pipe (often called a FIFO) is a mechanism for setting up a conversation between unrelated processes on the same machine. The names in a "named" pipe exist in the filesystem, which is just a funny way to say that you can put a special file in the filesystem namespace that has another process behind it instead of a disk.[9]

[9]You can do the same thing with Unix-domain sockets, but you can't use open on those.

A FIFO is convenient when you want to connect a process to an unrelated one. When you open a FIFO, your process will block until there's a process on the other end. So if a reader opens the FIFO first, it blocks until the writer shows up--and vice versa.

To create a named pipe, use the POSIX mkfifo function--if you're on a POSIX system, that is. On Microsoft systems, you'll instead want to look into the Win32::Pipe module, which, despite its possible appearance to the contrary, creates named pipes. (Win32 users create anonymous pipes using pipe just like the rest of us.)

For example, let's say you'd like to have your .signature file produce a different answer each time it's read. Just make it a named pipe with a Perl program on the other end that spits out random quips. Now every time any program (like a mailer, newsreader, finger program, and so on) tries to read from that file, that program will connect to your program and read in a dynamic signature.

In the following example, we use the rarely seen -p file test operator to determine whether anyone (or anything) has accidentally removed our FIFO.[10]If they have, there's no reason to try to open it, so we treat this as a request to exit. If we'd used a simple open function with a mode of "> $fpath", there would have been a tiny race condition that would have risked accidentally creating the signature as a plain file if it disappeared between the -p test and the open. We couldn't use a "+< $fpath" mode, either, because opening a FIFO for read-write is a nonblocking open (this is only true of FIFOs). By using sysopen and omitting the O_CREAT flag, we avoid this problem by never creating a file by accident.

use Fcntl;             # for sysopen
chdir;                 # go home
$fpath = '.signature';
$ENV{PATH} .= ":/usr/games";

unless (-p $fpath) {   # not a pipe
    if (-e _) {        # but a something else
        die "$0: won't overwrite .signature\n";
    } else {
        require POSIX;
        POSIX::mkfifo($fpath, 0666) or die "can't mknod $fpath: $!";
        warn "$0: created $fpath as a named pipe\n";
    }
}

while (1) {
    # exit if signature file manually removed
    die "Pipe file disappeared" unless -p $fpath;
    # next line blocks until there's a reader
    sysopen(FIFO, $fpath, O_WRONLY)
        or die "can't write $fpath: $!";
    print FIFO "John Smith (smith\@host.org)\n", `fortune -s`;
    close FIFO;
    select(undef, undef, undef, 0.2);  # sleep 1/5th second
}

The short sleep after the close is needed to give the reader a chance to read what was written. If we just immediately loop back up around and open the FIFO again before our reader has finished reading the data we just sent, then no end-of-file is seen because there's once again a writer. We'll both go round and round until during one iteration, the writer falls a little behind and the reader finally sees that elusive end-of-file. (And we were worried about race conditions?)

[10]Another use is to see if a filehandle is connected to a pipe, named or anonymous, as in -p STDIN.