Recipe 17.3. Communicating over TCP

17.3. Communicating over TCP

Problem

You want to read or write data over a TCP connection.

Solution

This recipe assumes you're using the Internet to communicate. For TCP-like communication within a single machine, see Recipe 17.6 .

Use print or < > :

print SERVER "What is your name?\n";
chomp ($response = <SERVER>);

Or, use send and recv :

defined (send(SERVER, $data_to_send, $flags))
    or die "Can't send : $!\n";

recv(SERVER, $data_read, $maxlen, $flags)
    or die "Can't receive: $!\n";

Or, use the corresponding methods on an IO::Socket object:

use IO::Socket;

$server->send($data_to_send, $flags)
    or die "Can't send: $!\n";

$server->recv($data_read, $flags)
    or die "Can't recv: $!\n";

To find out whether data can be read or written, use the select function, which is nicely wrapped by the standard IO::Socket class:

use IO::Select;

$select = IO::Select->new();
$select->add(*FROM_SERVER);
$select->add($to_client);

@read_from = $select->can_read($timeout);
foreach $socket (@read_from) {
    # read the pending data from $socket
}

Sockets handle two completely different types of I/O, each with attendant pitfalls and benefits. The normal Perl I/O functions used on files (except for seek and sysseek ) work for stream sockets, but datagram sockets require the system calls send and recv , which work on complete records.

Awareness of buffering issues is particularly important in socket programming. That's because buffering, while designed to enhance performance, can interfere with the interactive feel that some programs require. Gathering input with < > may try to read more data from the socket than is yet available as it looks for a record separator. Both print and < > use stdio buffers, so unless you've changed autoflushing (see the Introduction to Chapter 7, File Access ) on the socket handle, your data won't be sent to the other end as soon as you print it. Instead, it will wait until a buffer fills up.

For line-based clients and servers, this is probably okay, so long as you turn on autoflushing for output. Newer versions of IO::Socket do this automatically on the anonymous filehandles returned by IO::Socket->new .

But stdio isn't the only source of buffering. Output (print, printf , or syswrite - or send on a TCP socket) is further subject to buffering at the operating system level under a strategy called The Nagle Algorithm . When a packet of data has been sent but not acknowledged, further to-be-sent data is queued and is sent as soon as another complete packet's worth is collected or the outstanding acknowledgment is received. In some situations (mouse events being sent to a windowing system, keystrokes to a real-time application) this buffering is inconvenient or downright wrong. You can disable the Nagle Algorithm with the TCP_NODELAY socket option:

use Socket;
require "sys/socket.ph";    # for &TCP_NODELAY

setsockopt(SERVER, SOL_SOCKET, &TCP_NODELAY, 1)
    or die "Couldn't disable Nagle's algorithm: $!\n";

Re-enable it with:

setsockopt(SERVER, SOL_SOCKET, &TCP_NODELAY, 0)
    or die "Couldn't enable Nagle's algorithm: $!\n";

In most cases, TCP_NODELAY isn't something you need. TCP buffering is there for a reason, so don't disable it unless your application is one of the few real-time packet-intensive situations that need to.

Load in TCP_NODELAY from sys/socket.ph , a file that isn't automatically installed with Perl, but can be easily built. See Recipe 12.14 for details.

Because buffering is such an issue, you have the select function to determine which filehandles have unread input, which can be written to, and which have "exceptional conditions" pending. The select function takes three strings interpreted as binary data, each bit corresponding to a filehandle. A typical call to select looks like this:

$rin = '';                          # initialize bitmask
vec($rin, fileno(SOCKET), 1) = 1;   # mark SOCKET in $rin
# repeat calls to vec() for each socket to check

$timeout = 10;                      # wait ten seconds

$nfound = select($rout = $rin, undef, undef, $timeout);
if (vec($rout, fileno(SOCKET),1)){
    # data to be read on SOCKET
}

The four arguments to select are: a bitmask indicating which filehandles to check for unread data; a bitmask indicating which filehandles to check for safety to write without blocking; a bitmask indicating which filehandles to check for exceptional conditions on; and a time in seconds indicating the maximum time to wait (this can be a floating point number).

The function changes the bitmask arguments passed to it, so that when it returns, the only bits set correspond to filehandles ready for I/O. This leads to the common strategy of assigning an input mask ($rin above) to an output one ($rout about), so that select can only affect $rout , leaving $rin alone.

You can specify a timeout of 0 to poll (check without blocking). Some beginning programmers think that blocking is bad, so they write programs that "busy wait" - they poll and poll and poll and poll. When a program blocks, the operating system recognizes that the process is pending on input and gives CPU time to other programs until input is available. When a program busy-waits, the system can't let it sleep because it's always doing something - checking for input! Occasionally, polling is the right thing to do, but far more often it's not. A timeout of undef to select means "no timeout," and your program will patiently block until input becomes available.

Because select uses bitmasks, which are tiresome to create and difficult to interpret, we use the standard IO::Select module in the Solution section. It bypasses bitmasks and is, generally, the easier route.

A full explanation of the exceptional data tested for with the third bitmask in select is beyond the scope of this book. Consult Stevens's Unix Network Programming for a discussion of out-of-band and urgent data.

Other send and recv flags are listed in the manpages for those system calls.

17.3. Communicating over TCP

Problem

Solution

Discussion

See Also