Sockets (Programming Perl)

16.5. Sockets

The IPC mechanisms discussed earlier all have one severe restriction: they're designed for communication between processes running on the same computer. (Even though files can sometimes be shared across machines through mechanisms like NFS, locking fails miserably on many NFS implementations, which takes away most of the fun of concurrent access.) For general-purpose networking, sockets are the way to go. Although sockets were invented under BSD, they quickly spread to other forms of Unix, and nowadays you can find a socket interface on nearly every viable operating system out there. If you don't have sockets on your machine, you're going to have tremendous difficulty using the Internet.

With sockets, you can do both virtual circuits (as TCP streams) and datagrams (as UDP packets). You may be able to do even more, depending on your system. But the most common sort of socket programming uses TCP over Internet-domain sockets, so that's the kind we cover here. Such sockets provide reliable connections that work a little bit like bidirectional pipes that aren't restricted to the local machine. The two killer apps of the Internet, email and web browsing, both rely almost exclusively on TCP sockets.

You also use UDP heavily without knowing it. Every time your machine tries to find a site on the Internet, it sends UDP packets to your DNS server asking it for the actual IP address. You might use UDP yourself when you want to send and receive datagrams. Datagrams are cheaper than TCP connections precisely because they aren't connection oriented; that is, they're less like making a telephone call and more like dropping a letter in the mailbox. But UDP also lacks the reliability that TCP provides, making it more suitable for situations where you don't care whether a packet or two gets folded, spindled, or mutilated. Or for when you know that a higher-level protocol will enforce some degree of redundancy or fail-softness (which is what DNS does.)

Other choices are available but far less common. You can use Unix-domain sockets, but they only work for local communication. Various systems support various other non-IP-based protocols. Doubtless these are somewhat interesting to someone somewhere, but we'll restrain ourselves from talking about them somehow.

The Perl functions that deal with sockets have the same names as the corresponding syscalls in C, but their arguments tend to differ for two reasons: first, Perl filehandles work differently from C file descriptors; and second, Perl already knows the length of its strings, so you don't need to pass that information. See Chapter 29, "Functions" for details on each socket-related syscall.

One problem with ancient socket code in Perl was that people would use hard-coded values for constants passed into socket functions, which destroys portability. Like most syscalls, the socket-related ones quietly but politely return undef when they fail, instead of raising an exception. It is therefore essential to check these functions' return values, since if you pass them garbage, they aren't going to be very noisy about it. If you ever see code that does anything like explicitly setting $AF_INET = 2, you know you're in for big trouble. An immeasurably superior approach is to use the Socket module or the even friendlier IO::Socket module, both of which are standard. These modules provide various constants and helper functions you'll need for setting up clients and servers. For optimal success, your socket programs should always start out like this (and don't forget to add the -T taint-checking switch to the shebang line for servers):

#!/usr/bin/perl -w
use strict;
use sigtrap;
use Socket;  # or IO::Socket

As noted elsewhere, Perl is at the mercy of your C libraries for much of its system behavior, and not all systems support all sorts of sockets. It's probably safest to stick with normal TCP and UDP socket operations. For example, if you want your code to stand a chance of being portable to systems you haven't thought of, don't expect there to be support for a reliable sequenced-packet protocol. Nor should you expect to pass open file descriptors between unrelated processes over a local Unix-domain socket. (Yes, you can really do that on many Unix machines--see your local recvmsg(2) manpage.)

If you just want to use a standard Internet service like mail, news, domain name service, FTP, Telnet, the Web, and so on, then instead of starting from scratch, try using existing CPAN modules for these. Prepackaged modules designed for these include Net::SMTP (or Mail::Mailer), Net::NNTP, Net::DNS, Net::FTP, Net::Telnet, and the various HTTP-related modules. The libnet and libwww module suites both comprise many individual networking modules. Module areas on CPAN you'll want to look at are section 5 on Networking and IPC, section 15 on WWW-related modules, and section 16 on Server and Daemon Utilities.

In the sections that follow, we present several sample clients and servers without a great deal of explanation of each function used, as that would mostly duplicate the descriptions we've already provided in Chapter 29, "Functions".

16.5.1. Networking Clients

Use Internet-domain sockets when you want reliable client-server communication between potentially different machines.

To create a TCP client that connects to a server somewhere, it's usually easiest to use the standard IO::Socket::INET module:

use IO::Socket::INET;

$socket = IO::Socket::INET->new(PeerAddr => $remote_host,
                                PeerPort => $remote_port,
                                Proto    => "tcp",
                                Type     => SOCK_STREAM)
    or die "Couldn't connect to $remote_host:$remote_port : $!\n";

# send something over the socket,
print $socket "Why don't you call me anymore?\n";

# read the remote answer,
$answer = <$socket>;

# and terminate the connection when we're done.
close($socket);

A shorthand form of the call is good enough when you just have a host and port combination to connect to, and are willing to use defaults for all other fields:

$socket = IO::Socket::INET->new("www.yahoo.com:80")
    or die "Couldn't connect to port 80 of yahoo: $!";

To connect using the basic Socket module:

use Socket;

# create a socket
socket(Server, PF_INET, SOCK_STREAM, getprotobyname('tcp'));

# build the address of the remote machine
$internet_addr = inet_aton($remote_host)
    or die "Couldn't convert $remote_host into an Internet address: $!\n";
$paddr = sockaddr_in($remote_port, $internet_addr);

# connect
connect(Server, $paddr)
    or die "Couldn't connect to $remote_host:$remote_port: $!\n";

select((select(Server), $| = 1)[0]);  # enable command buffering

# send something over the socket
print Server "Why don't you call me anymore?\n";

# read the remote answer
$answer = <Server>;

# terminate the connection when done
close(Server);

If you want to close only your side of the connection, so that the remote end gets an end-of-file, but you can still read data coming from the server, use the shutdown syscall for a half-close:

# no more writing to server
shutdown(Server, 1);    # Socket::SHUT_WR constant in v5.6

16.5.2. Networking Servers

Here's a corresponding server to go along with it. It's pretty easy with the standard IO::Socket::INET class:

use IO::Socket::INET;

$server = IO::Socket::INET->new(LocalPort => $server_port,
                                Type      => SOCK_STREAM,
                                Reuse     => 1,
                                Listen    => 10 )   # or SOMAXCONN
    or die "Couldn't be a tcp server on port $server_port: $!\n";

while ($client = $server->accept()) {
    # $client is the new connection
}

close($server);

You can also write that using the lower-level Socket module:

use Socket;

# make the socket
socket(Server, PF_INET, SOCK_STREAM, getprotobyname('tcp'));

# so we can restart our server quickly
setsockopt(Server, SOL_SOCKET, SO_REUSEADDR, 1);

# build up my socket address
$my_addr = sockaddr_in($server_port, INADDR_ANY);
bind(Server, $my_addr)
    or die "Couldn't bind to port $server_port: $!\n";

# establish a queue for incoming connections
listen(Server, SOMAXCONN)
    or die "Couldn't listen on port $server_port: $!\n";

# accept and process connections
while (accept(Client, Server)) {
    # do something with new Client connection
}

close(Server);

The client doesn't need to bind to any address, but the server does. We've specified its address as INADDR_ANY, which means that clients can connect from any available network interface. If you want to sit on a particular interface (like the external side of a gateway or firewall machine), use that interface's real address instead. (Clients can do this, too, but rarely need to.)

If you want to know which machine connected to you, call getpeername on the client connection. This returns an IP address, which you'll have to translate into a name on your own (if you can):

use Socket;
$other_end = getpeername(Client)
    or die "Couldn't identify other end: $!\n";
($port, $iaddr) = unpack_sockaddr_in($other_end);
$actual_ip = inet_ntoa($iaddr);
$claimed_hostname = gethostbyaddr($iaddr, AF_INET);

This is trivially spoofable because the owner of that IP address can set up their reverse tables to say anything they want. For a small measure of additional confidence, translate back the other way again:

@name_lookup = gethostbyname($claimed_hostname)
    or die "Could not reverse $claimed_hostname: $!\n";
@resolved_ips = map { inet_ntoa($_) } @name_lookup[ 4 .. $#name_lookup ];
$might_spoof = !grep { $actual_ip eq $_ } @resolved_ips;

Once a client connects to your server, your server can do I/O both to and from that client handle. But while the server is so engaged, it can't service any further incoming requests from other clients. To avoid getting locked down to just one client at a time, many servers immediately fork a clone of themselves to handle each incoming connection. (Others fork in advance, or multiplex I/O between several clients using the select syscall.)

REQUEST:
while (accept(Client, Server)) {
    if ($kidpid = fork) {
        close Client;         # parent closes unused handle
        next REQUEST;
    } 
    defined($kidpid)   or die "cannot fork: $!" ;
    
    close Server;             # child closes unused handle
    
    select(Client);           # new default for prints
    $| = 1;                   # autoflush
    
    # per-connection child code does I/O with Client handle
    $input = <Client>;
    print Client "output\n";  # or STDOUT, same thing
    
    open(STDIN, "<<&Client")    or die "can't dup client: $!";
    open(STDOUT, ">&Client")    or die "can't dup client: $!";
    open(STDERR, ">&Client")    or die "can't dup client: $!";
    
    # run the calculator, just as an example
    system("bc -l");     # or whatever you'd like, so long as
                         # it doesn't have shell escapes!
    print "done\n";      # still to client
    
    close Client;
    exit;  # don't let the child back to accept!
}

This server clones off a child with fork for each incoming request. That way it can handle many requests at once, as long as you can create more processes. (You might want to limit this.) Even if you don't fork, the listen will allow up to SOMAXCONN (usually five or more) pending connections. Each connection uses up some resources, although not as much as a process. Forking servers have to be careful about cleaning up after their expired children (called "zombies" in Unix-speak) because otherwise they'd quickly fill up your process table. The REAPER code discussed in the section Section 16.1, "Signals" will take care of that for you, or you may be able to assign $SIG{CHLD} = 'IGNORE'.

Before running another command, we connect the standard input and output (and error) up to the client connection. This way any command that reads from STDIN and writes to STDOUT can also talk to the remote machine. Without the reassignment, the command couldn't find the client handle--which by default gets closed across the exec boundary, anyway.

When you write a networking server, we strongly suggest that you use the -T switch to enable taint checking even if you aren't running setuid or setgid. This is always a good idea for servers and any other program that runs on behalf of someone else (like all CGI scripts), because it lessens the chances that people from the outside will be able to compromise your system. See the section Section 16.1, "Handling Insecure Data" in Chapter 23, "Security" for much more about all this.

One additional consideration when writing Internet programs: many protocols specify that the line terminator should be CRLF, which can be specified various ways: "\015\12", or "\xd\xa", or even chr(13).chr(10). As of version 5.6 of Perl, saying v13.10 also produces the same string. (On many machines, you can also use "\r\n" to mean CRLF, but don't use "\r\n" if you want to be portable to Macs, where the meanings of \r and \n are reversed!) Many Internet programs will in fact accept a bare "\012" as a line terminator, but that's because Internet programs usually try to be liberal in what they accept and strict in what they emit. (Now if only we could get people to do the same...)

16.5.3. Message Passing

As we mentioned earlier, UDP communication involves much lower overhead but provides no reliability, since there are no promises that messages will arrive in a proper order--or even that they will arrive at all. UDP is often said to stand for Unreliable Datagram Protocol.

Still, UDP offers some advantages over TCP, including the ability to broadcast or multicast to a whole bunch of destination hosts at once (usually on your local subnet). If you find yourself getting overly concerned about reliability and starting to build checks into your message system, then you probably should just use TCP to start with. True, it costs more to set up and tear down a TCP connection, but if you can amortize that over many messages (or one long message), it doesn't much matter.

Anyway, here's an example of a UDP program. It contacts the UDP time port of the machines given on the command line, or everybody it can find using the universal broadcast address if no arguments were supplied.[13] Not all machines have a time server enabled, especially across firewall boundaries, but those that do will send you back a 4-byte integer packed in network byte order that represents what time that machine thinks it is. The time returned, however, is in the number of seconds since 1900. You have to subtract the number of seconds between 1900 and 1970 to feed that time to the localtime or gmtime conversion functions.

#!/usr/bin/perl
# clockdrift - compare other systems' clocks with this one
#              without arguments, broadcast to anyone listening.
#              wait one-half second for an answer.

use v5.6.0;  # or better
use warnings;
use strict;
use Socket;

unshift(@ARGV, inet_ntoa(INADDR_BROADCAST))
    unless @ARGV;

socket(my $msgsock, PF_INET, SOCK_DGRAM, getprotobyname("udp"))
    or die "socket: $!";

# Some borked machines need this.  Shouldn't hurt anyone else.
setsockopt($msgsock, SOL_SOCKET, SO_BROADCAST, 1)
    or die "setsockopt: $!";

my $portno = getservbyname("time", "udp")    
    or die "no udp time port";

for my $target (@ARGV) {
    print "Sending to $target:$portno\n";
    my $destpaddr = sockaddr_in($portno, inet_aton($target));
    send($msgsock, "x", 0, $destpaddr)
        or die "send: $!";
}

# daytime service returns 32-bit time in seconds since 1900
my $FROM_1900_TO_EPOCH = 2_208_988_800;
my $time_fmt = "N";   # and it does so in this binary format
my $time_len = length(pack($time_fmt, 1));  # any number's fine

my $inmask = '';  # string to store the fileno bits for select
vec($inmask, fileno($msgsock), 1) = 1;

# wait only half a second for input to show up
while (select(my $outmask = $inmask, undef, undef, 0.5)) {
    defined(my $srcpaddr = recv($msgsock, my $bintime, $time_len, 0))
        or die "recv: $!";
    my($port, $ipaddr) = sockaddr_in($srcpaddr);
    my $sendhost = sprintf "%s [%s]",
                    gethostbyaddr($ipaddr, AF_INET) || 'UNKNOWN',
                    inet_ntoa($ipaddr);
    my $delta = unpack($time_fmt, $bintime) -
                      $FROM_1900_TO_EPOCH - time();
    print "Clock on $sendhost is $delta seconds ahead of this one.\n";
}

[13] If that doesn't work, run ifconfig -a to find the proper local broadcast address.