Reading an Entire Line Without Blocking (Perl Cookbook, 2nd Edition)

7.23. Reading an Entire Line Without Blocking

7.23.1. Problem

You need to read a line of data from a handle that select says is ready for reading, but you can't use Perl's normal <FH> operation (readline) in conjunction with select because <FH> may buffer extra data and select doesn't know about those buffers.

7.23.2. Solution

Use the following sysreadline function, like this:

$line = sysreadline(SOME_HANDLE);

In case only a partial line has been sent, include a number of seconds to wait:

$line = sysreadline(SOME_HANDLE, TIMEOUT);

Here's the function to do that:

use IO::Handle;
use IO::Select;
use Symbol qw(qualify_to_ref);

sub sysreadline(*;$) {
    my($handle, $timeout) = @_;
    $handle = qualify_to_ref($handle, caller( ));
    my $infinitely_patient = (@_ =  = 1 || $timeout < 0);
    my $start_time = time( );
    my $selector = IO::Select->new( );
    $selector->add($handle);
    my $line = "";
SLEEP:
    until (at_eol($line)) {
        unless ($infinitely_patient) {
            return $line if time( ) > ($start_time + $timeout);
        }
        # sleep only 1 second before checking again
        next SLEEP unless $selector->can_read(1.0);
INPUT_READY:
        while ($selector->can_read(0.0)) {
            my $was_blocking = $handle->blocking(0);
CHAR:       while (sysread($handle, my $nextbyte, 1)) {
                $line .= $nextbyte;
                last CHAR if $nextbyte eq "\n";
            }
            $handle->blocking($was_blocking);
            # if incomplete line, keep trying
            next SLEEP unless at_eol($line);
            last INPUT_READY;
        }
    }
    return $line;
}
sub at_eol($) { $_[0] =~ /\n\z/ }

7.23.3. Discussion

As described in Recipe 7.22, to determine whether the operating system has data on a particular handle for your process to read, you can use either Perl's built-in select function or the can_read method from the standard IO::Select module.

Although you can reasonably use functions like sysread and recv to get data, you can't use the buffered functions like readline (that is, <FH>), read, or getc. Also, even the unbuffered input functions might still block. If someone connects and sends a character but never sends a newline, your program will block in a <FH>, which expects its input to end in a newline—or in whatever you've assigned to the $/ variable.

We circumvent this by setting the handle to non-blocking mode and then reading in characters until we find "\n". This removes the need for the blocking <FH> call. The sysreadline function in the Solution takes an optional second argument so you don't have to wait forever in case you get a partial line and nothing more.

A far more serious issue is that select reports only whether the operating system's low-level file descriptor is available for I/O. It's not reliable in the general case to mix calls to four-argument select with calls to any of the buffered I/O functions listed in this chapter's Introduction (read, <FH>, seek, tell, etc.). Instead, you must use sysread—and sysseek if you want to reposition the filehandle within the file.

The reason is that select's response does not reflect any user-level buffering in your own process's address space once the kernel has transferred the data. But the <FH>—really Perl's readline( ) function—still uses your underlying buffered I/O system. If two lines were waiting, select would report true only once. You'd read the first line and leave the second one in the buffer. But the next call to select would block because, as far as the kernel is concerned, it's already given you all of the data it had. That second line, now hidden from your kernel, sits unread in an input buffer that's solely in user space.

7.23.4. See Also

The sysread function in perlfunc(1) and in Chapter 29 of Programming Perl; the documentation for the standard modules Symbol, IO::Handle, and IO::Select (also in Chapter 32 of Programming Perl); Recipe 7.22