home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Perl CookbookPerl CookbookSearch this book

17.3. Communicating over TCP

17.3.3. Discussion

Sockets handle two completely different types of I/O, each with attendant pitfalls and benefits. The normal Perl I/O functions used on files (except for seek and sysseek) work for stream sockets, but datagram sockets require the system calls send and recv, which work on complete records.

Awareness of buffering issues is particularly important in socket programming. That's because buffering, while designed to enhance performance, can interfere with the interactive feel that some programs require. Gathering input with <> may try to read more data from the socket than is yet available as it looks for a record separator. Both print and <> use stdio buffers, so unless you've changed autoflushing (see the Introduction to Chapter 7) on the socket handle, your data won't be sent to the other end as soon as you print it. Instead, it will wait until a buffer fills up.

For line-based clients and servers, this is probably okay, so long as you turn on autoflushing for output. Newer versions of IO::Socket do this automatically on the anonymous filehandles returned by IO::Socket->new.

But stdio isn't the only source of buffering. Output (print, printf, or syswrite—or send on a TCP socket) is further subject to buffering at the operating system level under a strategy called the Nagle Algorithm. When a packet of data has been sent but not acknowledged, further to-be-sent data is queued and is sent as soon as another complete packet's worth is collected or the outstanding acknowledgment is received. In some situations (mouse events being sent to a windowing system, keystrokes to a real-time application) this buffering is inconvenient or downright wrong. You can disable the Nagle Algorithm with the TCP_NODELAY socket option:

use Socket;
require "sys/socket.ph";    # for &TCP_NODELAY

setsockopt(SERVER, SOL_SOCKET, &TCP_NODELAY, 1)
    or die "Couldn't disable Nagle's algorithm: $!\n";

Re-enable it with:

setsockopt(SERVER, SOL_SOCKET, &TCP_NODELAY, 0)
    or die "Couldn't enable Nagle's algorithm: $!\n";

In most cases, TCP_NODELAY isn't something you need. TCP buffering is there for a reason, so don't disable it unless your application is one of the few real-time packet-intensive situations that need to.

Load in TCP_NODELAY from sys/socket.ph, a file that isn't automatically installed with Perl, but can be easily built. See Recipe 12.17 for details.

Because buffering is such an issue, you have the select function to determine which filehandles have unread input, which can be written to, and which have "exceptional conditions" pending. The select function takes three strings interpreted as binary data, each bit corresponding to a filehandle. A typical call to select looks like this:

$rin = '';                          # initialize bitmask
vec($rin, fileno(SOCKET), 1) = 1;   # mark SOCKET in $rin
# repeat calls to vec( ) for each socket to check

$timeout = 10;                      # wait ten seconds

$nfound = select($rout = $rin, undef, undef, $timeout);
if (vec($rout, fileno(SOCKET),1)){
    # data to be read on SOCKET
}

The four arguments to select are: a bitmask indicating which filehandles to check for unread data; a bitmask indicating which filehandles to check for safety to write without blocking; a bitmask indicating which filehandles to check for exceptional conditions; and a time in seconds indicating the maximum time to wait (this can be a floating-point number).

The function changes the bitmask arguments passed to it, so that when it returns, the only bits set correspond to filehandles ready for I/O. This leads to the common strategy of assigning an input mask ($rin in the previous example) to an output one ($rout in the example), so that select can only affect $rout, leaving $rin alone.

You can specify a timeout of 0 to poll (check without blocking). Some beginning programmers think that blocking is bad, so they write programs that "busy-wait"—they poll and poll and poll and poll. When a program blocks, the operating system recognizes that the process is pending on input and gives CPU time to other programs until input is available. When a program busy-waits, the system can't let it sleep, because it's always doing something—checking for input! Occasionally, polling is the right thing to do, but far more often it's not. A timeout of undef to select means "no timeout," and your program will patiently block until input becomes available.

Because select uses bitmasks, which are tiresome to create and difficult to interpret, we use the standard IO::Select module in the Solution. It bypasses bitmasks and is generally the easier route.

A full explanation of the exceptional data that is tested for with the third bitmask in select is beyond the scope of this book. Consult Stevens's UNIX Network Programming for a discussion of out-of-band and urgent data.

Other send and recv flags are listed in the manpages for those system calls.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.