Danger on the Wire (Perl for System Administration)

10.4. Danger on the Wire

SNMP is good for proactive monitoring (and some reactive monitoring situations when using SNMP traps), but it doesn't always help with unplanned situations like network emergencies. In these situations, you may need to monitor the network in ways that are not covered by the available SNMP variables.

10.4.1. Perl Saves the Day

Here's a true story that shows how Perl can help in these times. One Saturday evening I casually logged into a machine on my network to read my email. Much to my surprise, I found our mail and web servers near death and fading fast. Attempts to read and send mail or look at web content yielded slow responses, hung connections, and outright connection failures. Our mail queue was starting to reach critical mass.

I looked first at the state of the servers. Interactive response was fine, and the CPU load was high, but not deadly. One sign of trouble was the number of mail processes running. According to the mail logs, there were more processes running than expected because many transactions were not completing. Processes that had started up to handle incoming connections from the outside were hanging, driving up the load. This load was then capping any new outgoing connections from initiating. This strange network behavior led me to examine the current connection table of the server using netstat.

The last column of the netstat outputtold me that there were indeed many connections in progress on that machine from many different hosts. The big shocker was the state of those connections. Instead of looking like this:

tcp    0      0  mailhub.3322    mail.mel.aone.ne.smtp  ESTABLISHED
tcp    0      0  mailhub.3320    edunet.edunet.dk.smtp  CLOSE_WAIT
tcp    0      0  mailhub.1723    kraken.mvnet.wne.smtp  ESTABLISHED
tcp    0      0  mailhub.1709    plover.net.bridg.smtp  CLOSE_WAIT

they looked more like this:

tcp    0      0  mailhub.3322    mail.mel.aone.ne.smtp  SYN_RCVD
tcp    0      0  mailhub.3320    edunet.edunet.dk.smtp  SYN_RCVD
tcp    0      0  mailhub.1723    kraken.mvnet.wne.smtp  SYN_RCVD
tcp    0      0  mailhub.1709    plover.net.bridg.smtp  CLOSE_WAIT

At first, this looked like a classic Denial of Service attack called a SYN Flood or a SYN-ACK attack. To understand these attacks, we have to digress for a moment and talk a little bit about how the TCP/IP protocol works.

Every TCP/IP connection begins with a handshake between the participants. This little dance lets both the initiator and the recipient signal their readiness to enter into a conversation. The first step is taken by the initiating network entity. It sends a SYN (for SYNchronize) packet to the recipient. If the recipient wishes to talk, it will send back a SYN-ACK, an ACKnowledgment of the request, and record that a conversation is about to begin in its pending connection table. The initiator then replies to the SYN-ACK with an ACK packet, confirming the SYN-ACK was heard. The recipient hears the ACK, removes the entry from its pending table, and away they go.

At least, that's what should happen. In a SYN Flood situation, a nogoodnik will send a flood of SYN packets to a machine, often with spoofed source addresses. The unsuspecting machine will send SYN-ACK s to the spoofed source addresses and open an entry in its pending communication table for each SYN packet it has received. These bogus connection entries will stay in the pending table until the OS ages them out using some default timeout value. If enough packets are sent, the pending communication table will fill up and no legitimate connection attempts will succeed. This leads to symptoms like those I was experiencing at the time, and similar netstat output.

The one anomaly in the netstat output that made me question this diagnosis was the variety of hosts represented in the table. It was possible that someone had a program with superb spoofing capabilities, but you usually expect to see many connections from a smaller set of bogus hosts. Many of these hosts also seemed perfectly legitimate. Further clouding the situation was the result of a few connectivity tests I ran. Sometimes I could ping or traceroute to a randomly selected host listed in my netstat output, sometimes I couldn't. I needed more data. I needed to get a better grasp on the connectivity to these remote hosts. That's where Perl came in.

Because I was writing code under the gun, I wrote a very simple script that relied on the output of two other external network programs to handle the hard parts of the task. Let me show you that version, and then we'll use this task as a springboard for some more advanced programming.

The task in this case boiled down to one question: could I reach the hosts that appeared to be trying to connect to me? To find out which hosts were trying to contact my machine, I turned to a program called clog written by Brian Mitchell, found at ftp://coast.cs.purdue.edu/pub/mirrors/ftp.saturn.net/clog. clog uses the Unix libpcap library from Lawrence Berkeley National Laboratory's Network Research Group to sniff the network for TCP connection requests, i.e., SYN packets. This is the same library used by the seminal network monitoring program tcpdump. Found at ftp://ftp.ee.lbl.gov/libpcap.tar.Z, libpcap works for Linux machines as well. A libpcap port for NT/2000 can be found at http://netgroup-serv.polito.it/windump/or http://www.ntop.org/libpcap.html, but I have yet to see one for MacOS.

clog reports SYN packets like this:

Mar 02 11:21|192.168.1.51|1074|192.168.1.104|113
Mar 02 11:21|192.168.1.51|1094|192.168.1.104|23

The output above shows two connection requests from 192.168.1.51 to 192.168.1.104. The first was an attempt to connect to port 113 (ident), the second to port 23 (telnet).

With clog I was able to learn which hosts were attempting connections to me. And now I needed to know whether I could also reach them. That task was left to a program called fping by Roland J. Schemers III. fping, which can be found at http://www.stanford.edu/~schemers/docs/fping/fping.html, is a fast and fancy ping program for testing network connectivity on Unix and variants. Putting these external commands together, we get this little Perl program:

$clogex   = "/usr/local/bin/clog";      # location/switches for clog
$fpingex  = "/usr/local/bin/fping -r1"; # location/switches for fping

$localnet = "192.168.1";                # local network prefix

open CLOG, "$clogex|" or die "Unable to run clog:$!\n";
while(<CLOG>){
    ($date,$orighost,$origport,$desthost,$destport) = split(/\|/);
    next if ($orighost =~ /^$localnet/);
    next if (exists $cache{$orighost});
    print `$fpingex $orighost`;
    $cache{$orighost}=1;
}

This program runs the clog command and reads its output ad infinitum. Since our internal network connectivity wasn't suspect, each originating host is checked against our local network's addressing prefix. Traffic from our local network is ignored.

Like our last SNMP example, we perform some rudimentary caching. To be a good net citizen we want to avoid hammering outside machines with multiple ping packets, so we keep track of every host we've already queried. The -r1 flag to fping is used to restrict the number of times fping will retry a host (the default is three retries).

This program has to be run with elevated privileges, since both clog and fping need privileged access to the computer's network interface. This program printed output like this:

199.174.175.99 is unreachable
128.148.157.143 is unreachable
204.241.60.5 is alive
199.2.26.116 is unreachable
199.172.62.5 is unreachable
130.111.39.100 is alive
207.70.7.25 is unreachable
198.214.63.11 is alive
129.186.1.10 is alive

Clearly something fishy was going on here. Why would half of the sites be reachable, and the other half unreachable? Before we answer that question, let's look at what we could do to improve this program. A first step would be to remove the external program dependencies. Learning how to sniff the network and send ping packets from Perl could open up a whole range of possibilities. Let's take care of removing the easy dependency first.

The Net::Ping module by Russell Mosemann, found in the Perl distribution, can help us with testing connectivity to network hosts. Net::Ping allows us to send three different flavors of ping packets and check for a return response: ICMP, TCP, and UDP. Internet Control Message Protocol (ICMP) echo packets are "ping classic," the kind of packet sent by the vast majority of the command-line ping programs. This particular packet flavor has two disadvantages:

Like our previous clog /fping code, any Net::Ping scripts using ICMP need to be run with elevated privileges.
Perl on MacOS does not currently support ICMP. This may be remedied in the future, but you should be aware of this portability constraint.

The other two choices for Net::Ping packets are TCP and UDP. Both of these choices send packets to a remote machine's echo service port. Using these two options gains you portability, but you may find them less reliable than ICMP. ICMP is built into all standard TCP/IP stacks, but all machines may not be running the echo service. As a result, unless ICMP is deliberately filtered, you are more likely to receive a response to an ICMP packet than to the other types.

Net::Ping uses the standard object-oriented programming model, so the first step is the creation of a new ping object instance:

use Net::Ping;
$p = new Net::Ping("icmp");

Using this object is simple:

if ($p->ping("host")){
    print "ping succeeded.\n";
else{
    print "ping failed\n";
}

Now let's dig into the hard part of our initial script, the network sniffing handled by clog. Unfortunately, at this point we may need to let our MacOS readers off the bus. The Perl code we are about to explore is tied to the libpcap library we mentioned earlier, so using it on anything but a Unix variant may be dicey or impossible.

The first step is to build libpcap on your machine. I recommend you also build tcpdump as well. Like our use of the UCD-SNMP command-line utilities earlier, tcpdump can be used to explore libpcap functionality before coding Perl or to double-check that code.

With libpcap built, it is easy to build the Net::Pcap module, originally by Peter Lister and later completely rewritten by Tim Potter. This module gives full access to the power of libpcap. Let's see how we can use it to find SYN packets, à la clog.

Our code begins by querying the machine for an available/sniffable network interface and the settings for that interface:

use Net::Pcap;

# find the sniffable network device
$dev = Net::Pcap::lookupdev(\$err) ;
die "can't find suitable device: $err\n" unless $dev;

# figure out the network number and mask of that device
die "can't figure out net info for dev:$err\n"
  if (Net::Pcap::lookupnet($dev,\$netnum,\$netmask,\$err));

Most of the libpcap functions use the C convention of returning 0 for success, or -1 for failure, hence the die if . . .idiom is used often in Net::Pcap Perl code. The meaning of the arguments fed to each of the functions we'll be using can be found in the pcap(3) manual page.

Given the network interface information, we can tell libpcap we want to sniff the live network (as opposed to reading packets from a previously saved packet file). Net::Pcap::open_live will hand us back a packet capture descriptor to refer to this session:

# open that interface for live capture
$descript = Net::Pcap::open_live($dev,100,1,1000,\$err) ;
die "can't obtain pcap descriptor:$err\n" unless $descript;

libpcap gives you the ability to capture all network traffic or a select subset based on filter criteria of your choosing. Its filtering mechanism is very efficient, so it is often best to invoke it up front, rather than sifting through all of the packets via Perl code. In our case, we need to only look at SYN packets.

So what's a SYN packet? To understand that, you need to know a little bit about how TCP packets are put together. Figure 10-2 shows a picture from RFC793 of a TCP packet and its header.

Figure 10.2. Diagram of a TCP packet

A SYN packet, for our purposes, is simply one that has only the SYN flag (highlighted in Figure 10-2) in the packet header set. In order to tell libpcap to capture packets like this, we need to specify which byte it should look at in the packet. Each tick mark above is a bit, so let's count bytes. Figure 10-3 shows the same packet with byte numbers.

Figure 10.3. Finding the right byte in a TCP packet

We'll need to check if byte 13 is set to binary 00000010, or 2. The filter string we'll need is tcp[13] = 2. If we wanted to check for packets which had at least the SYN flag set, we could use tcp[13] & 2 != 0. This filter string then gets compiled into a filter program and set:

$prog = "tcp[13] = 2"; 

# compile and set our "filter program" 
die "unable to compile $prog\n" 
  if (Net::Pcap::compile($descript ,\$compprog,$prog,0,$netmask)) ;
die "unable to set filter\n" 
  if (Net::Pcap::setfilter($descript,$compprog));

We're seconds away from letting libpcap do its stuff. Before we can, we need to tell it what to do with the packets it retrieves for us. For each packet it sees that matches our filter program, it will execute a callback subroutine of our choice. This subroutine is handed three arguments:

A user ID string, optionally set when starting a capture, that allows a callback procedure to distinguish between several open packet capture sessions.
A reference to a hash describing the packet header (timestamps, etc.).
A copy of the entire packet.

We'll start with a very simple callback subroutine that prints the length of the packet we receive:

sub printpacketlength {
  print length($_[2]),"\n";
}

With our callback subroutine in place, we begin watching the wire for SYN packets:

die "Unable to perform capture:".Net::Pcap::geterr($descript)."\n"
  if (Net::Pcap::loop($descript,-1,\&printpacketlength, ''));

die "Unable to close device nicely\n" 
  if (Net::Pcap::close($descript));

The second argument of -1 to Net::Pcap::loop( ) specifies the number of packets we wish to capture before exiting. In this case we've signaled it to capture packets ad infinitum.

The code you've just seen captures SYN packets and prints their lengths, but that's not quite where we wanted be when we started this section. We need a program that watches for SYN packets from another network and attempts to ping the originating hosts. We have almost all of the pieces; the only thing we are missing is a way to take the SYN packets we've received and determine their source.

Like our nitty-gritty DNS example in Chapter 5, "TCP/IP Name Services", we'll need to take a raw packet and dissect it. Usually this entails reading the specifications (RFCs) and constructing the necessary unpack( ) templates. Tim Potter has done this hard work, producing a set of NetPacket modules: NetPacket::Ethernet, NetPacket::IP, NetPacket::TCP, NetPacket::ICMP, and so on. Each of these modules provides two methods: strip( ) and decode( ).

strip( ) simply returns the packet data with that network layer stripped from it. Remember, a TCP/IP packet on an Ethernet network is really just a TCP packet embedded in an IP packet embedded in an Ethernet packet. So if $pkt holds a TCP/IP packet, NetPacket::Ethernet::strip($pkt) would return an IP packet (having stripped off the Ethernet layer). If you needed to get at the TCP portion of $pkt, you could use NetPacket::IP::strip(NetPacket::Ethernet::strip($packet)) to strip off both the IP and Ethernet layers.

decode( ) takes this one step further. It actually breaks a packet into its component parts and returns an instance of an object that contains all of these parts. For instance:

NetPacket::TCP->decode(
   NetPacket::IP::strip(NetPacket::Ethernet::strip($packet)))

This returns an object instance with the following fields:

Field Name	Description
`src_port`	Source TCP port
`dest_port`	Destination TCP port
`Seqnum`	TCP sequence number
`Acknum`	TCP acknowledgment number
`Hlen`	Header length
`Reserved`	6-bit "reserved" space in the TCP header
`Flags`	URG, ACK, PSH, RST, SYN, and FIN flags
`Winsize`	TCP window size
`Cksum`	TCP checksum
`Urg`	TCP urgent pointer
`Options`	Any TCP options in binary form
`Data`	Encapsulated data (payload) for this packet

These should look familiar to you from Figure 10-2. To get the destination TCP port for a packet, we can use:

$pt = NetPacket::TCP->decode(
         NetPacket::IP::strip(
             NetPacket::Ethernet::strip($packet)))->{dest_port};

Let's tie this all together and throw in one more dash of variety. Potter has created a small wrapper for the Net::Pcap initialization and loop code and released it in his Net::PcapUtils module. It handles several of the steps we performed, making our code shorter. Here it is in action, along with everything else we've learned along the way in the last section:

use Net::PcapUtils;

use NetPacket::Ethernet;
use NetPacket::IP;

use Net::Ping;

# local network 
$localnet = "192.168.1";
# filter string that looks for SYN-only packets not originating from 
# local network
$prog = "tcp[13] = 2 and src net not $localnet"; 

$| = 1; # unbuffer STDIO

# construct the ping object we'll use later
$p = new Net::Ping("icmp");

# and away we go
die "Unable to perform capture:".Net::Pcap::geterr($descript)."\n"
  if (Net::PcapUtils::open_live(\&grab_ip_and_ping, FILTER => $prog));

# find the source IP address of a packet, and ping it (once per run)
sub grab_ip_and_ping{
    my ($arg,$hdr,$pkt) = @_ ;

    # get the source IP adrress
    $src_ip = NetPacket::IP->decode(
                 NetPacket::Ethernet::strip($pkt))->{src_ip};

    print "$src_ip is ".(($p->ping($src_ip)) ? 
                         "alive" : "unreachable")."\n" 
      unless $cache{$src_ip}++;
}

Now that we've achieved our goal of writing a program completely in Perl that would have helped diagnose my server problem (albeit using some modules that are Perl wrappers around C code), let me tell you the end of the story.

On Sunday morning, the central support group outside of my department discovered an error in their router configuration. A student in one of the dorms had installed Linux on his machine and misconfigured the network routing daemon. This machine was broadcasting to the rest of the university that it was a default route to the Internet. The misconfigured router that fed our department was happy to listen to this broadcast and promptly changed its routing table to add a second route to the rest of the universe. Packets would come to us from the outside world, and this router dutifully doled out our response packets evenly between both destinations. This "a packet for the real router to the Internet, a packet for the student's machine, a packet for the real router, a packet for the student's machine..." distribution created an asymmetric routing situation. Once the bogus route was cleared and filters put in place to prevent it from returning, our life returned to normal. I won't tell you what happened to the student who caused the problem.

In this section, you have now seen one diagnostic application of the Net::Pcap, Net::PcapUtils, and NetPacket::* family of modules. Don't stop there! These modules give you the flexibility to construct a whole variety of programs that can help you debug network problems or actively watch your wire for danger.