Flushing Output (Perl Cookbook, 2nd Edition)

7.19.3. Discussion

In most stdio implementations, buffering varies with the type of output device. Disk files are block buffered, often with a buffer size of more than 2K. Pipes and sockets are often buffered with a buffer size between ½K and 2K. Serial devices, including terminals, modems, mice, and joysticks, are normally line-buffered; stdio sends the entire line out only when it gets the newline.

Perl's print function does not directly support truly unbuffered output, i.e., a physical write for each individual character. Instead, it supports command buffering, in which one physical write is made after every separate output command. This isn't as hard on your system as no buffering at all, and it still gets the output where you want it, when you want it.

Control output buffering through the $| special variable. Enable command buffering on output handles by setting it to a true value. This does not affect input handles at all; see Recipe 15.6 and Recipe 15.8 for unbuffered input. Set this variable to a false value to use default stdio buffering. Example 7-7 illustrates the difference.

Example 7-7. seeme

  #!/usr/bin/perl -w
  # seeme - demo stdio output buffering
  $| = (@ARGV > 0);      # command buffered if arguments given
  print "Now you don't see it...";
  sleep 2;
  print "now you do\n";

If you call this program with no arguments, STDOUT is not command buffered. Your terminal (console, window, telnet session, whatever) doesn't receive output until the entire line is completed, so you see nothing for two seconds and then get the full line "Now you don't see it ... now you do". If you call the program with at least one argument, STDOUT is command buffered. That means you first see "Now you don't see it...", and then after two seconds you finally see "now you do".

The dubious quest for increasingly compact code has led programmers to use the return value of select, the filehandle that was currently selected, as part of the second select:

select((select(OUTPUT_HANDLE), $| = 1)[0]);

There's another way. The IO::Handle module and any modules that inherit from that class provide three methods for flushing: flush, autoflush, and printflush. All are invoked on filehandles, either as literals or as variables containing a filehandle or reasonable facsimile.

The flush method causes all unwritten output in the buffer to be written out, returning true on failure and false on success. The printflush method is a print followed by a one-time flush. The autoflush method is syntactic sugar for the convoluted antics just shown. It sets the command-buffering property on that filehandle (or clears it if passed an explicit false value), and returns the previous value for that property on that handle. For example:

use FileHandle;

STDERR->autoflush;          # already unbuffered in stdio
$filehandle->autoflush(0);

If you're willing to accept the oddities of indirect object notation covered in Chapter 13, you can even write something reasonably close to English:

use IO::Handle;
# assume REMOTE_CONN is an interactive socket handle,
# but DISK_FILE is a handle to a regular file.
autoflush REMOTE_CONN  1;           # unbuffer for clarity
autoflush DISK_FILE    0;           # buffer this for speed

This avoids the bizarre select business and makes your code much more readable. Unfortunately, your program takes longer to compile because now you're including the IO::Handle module, so dozens of files need to be opened and thousands and thousands of lines must first be read and compiled. For short and simple applications, you might as well learn to manipulate $| directly, and you'll be happy. But for larger applications that already use a class derived from the IO::Handle class, you've already paid the price for the ticket, so you might as well see the show.

To ensure that your output gets where you want it, when you want it, buffer flushing is important. It's particularly important with sockets, pipes, and devices, because you may be trying to do interactive I/O with these—more so, even, because you can't assume line buffering. Consider the program in Example 7-8.

Example 7-8. getpcomidx

  #!/usr/bin/perl -w
  # getpcomidx - fetch www.perl.com's index.html document
  use IO::Socket;
  $sock = new IO::Socket::INET (PeerAddr => "www.perl.com",
                                PeerPort => "http(80)");
  die "Couldn't create socket: $@" unless $sock;
  # the library doesn't support $! setting; it uses $@
  $sock->autoflush(1);
  # Mac *must* have \015\012\015\012 instead of \n\n here.
  # It's a good idea for others, too, as that's the spec,
  # but implementations are encouraged to accept "\cJ\cJ" too,
  # and as far as we've seen, they do.
  $sock->print("GET /index.html http/1.1\n\n");
  $document = join("", $sock->getlines( ));
  print "DOC IS: $document\n";

If you're running at least v5.8 Perl, you can use the new I/O layers mechanism to force unbuffered output. This is available through the :unix layer. If the handle is already open, you can do this:

binmode(STDOUT, ":unix")
    || die "can't binmode STDOUT to :unix: $!";

or you can specify the I/O layer when initially calling open:

open(TTY, ">:unix", "/dev/tty")
    || die "can't open /dev/tty: $!";
print TTY "54321";
sleep 2;
print TTY "\n";

There's no way to control input buffering using the sorts of flushing discussed so far. For that, you need to see Recipe 15.6 and Recipe 15.8.

Example 7-7. seeme

Example 7-8. getpcomidx

7.19. Flushing Output

7.19.1. Problem

7.19.2. Solution

7.19.3. Discussion

7.19.4. See Also