10.2. Noticing Suspicious ActivitiesA good night watchman needs more than just the ability to monitor for change. She or he also needs to be able to spot suspicious activities and circumstances. A hole in the perimeter fence or unexplained bumps in the night need to be brought to someone's attention. We can write programs to play this role. 10.2.1. Local Signs of PerilIt's unfortunate, but learning to be good at spotting signs of suspicious activity often comes as a result of pain and the desire to avoid it in the future. After the first few security breaches, you'll start to notice that intruders often follow certain patterns and leave behind telltale clues. Spotting these signs, once you know what they are, is often easy in Perl.
For instance, intruders, especially the less-sophisticated kind, often try to hide their activities by creating "hidden" directories to store their data. On Unix and Linux systems they will put exploit code and sniffer output in directories with names like "..." (dot dot dot), ". " (dot space), or " Mail" (space Mail). These names are likely to be passed over in a cursory inspection of ls output. We can easily write a program to search for these names using the tools we learned about in Chapter 2, "Filesystems". Here's a program based on the File::Find module (as called by find.pl ) which looks for anomalous directory names.
Remember the sidebar "Regular Expressions" in Chapter 9, "Log Files"? Filesystem sifting programs like these are another example where this holds true. The effectiveness of these programs often hinges on the quality and quantity of their regular expressions. Too few regexps and you miss things you might want to catch. Too many regexps or regexps that are inefficient gives your program an exorbitant runtime and resource usage. If you use regexps that are too loose, the program will generate many false positives. It is a delicate balance. 10.2.2. Finding Problematic PatternsLet's use some of the things we learned in Chapter 9, "Log Files" to move us along in our discussion. We've just talked about looking for suspicious objects; now let's move on to looking for patterns that may indicate suspicious activity. We can demonstrate this with a program that does some primitive logfile analysis to determine potential break-ins. This example is based on the following premise: most users logging in remotely do so consistently from the same place or a small list of places. They usually log in remotely from a single machine, or from the same ISP modem bank each time. If you find an account that has logged in from more than a handful of domains, it's a good indicator that this account has been compromised and the password has been widely distributed. Obviously this premise does not hold for populations of highly mobile users, but if you find an account that has been logged into from Brazil and Finland in the same two-hour period, that's a pretty good indicator that something is fishy. Let's walk through some code that looks for this indicator. This code is Unix-centric, but the techniques demonstrated in it are platform independent. First, here's our built-in documentation. It's not a bad idea to put something like this near the top of your program for the sake of other people who will read your code. Before we move on, be sure to take a quick look at the arguments the rest of the program will support:
First we parse the user's command-line arguments. The getopts line below will look at the arguments to the program and set $opt_<flag letter> appropriately. The colon after the letter means that option takes an argument:
The following lines reflect the portability versus efficiency decision we discussed in the Chapter 9, "Log Files". Here we're opting to call an external program. If you wanted to make the program less portable and a little more efficient, you could use unpack( ) as discussed in that chapter: $lastex = (defined $opt_l) ? $opt_l : "/usr/ucb/last"; open(LAST,"$lastex|") || die "Can't run the program $lastex:$!\n"; Before we get any further into the program, let's take a quick look at the hash of lists data structure this program uses as it processes the data from last. This hash will have a username as its key and a reference to a list of the unique domains that user has logged in from as its value. For instance, a sample entry might be:
This entry shows the account laf has logged in from the ccs.neu.edu, xerox.com, and foobar.edu domains. We begin by iterating over the input we get from last ; the output on our system looks like this: cindy pts/10 sinai.ccs.neu.ed Fri Mar 27 13:51 still logged in michael pts/3 regulus.ccs.neu. Fri Mar 27 13:51 still logged in david pts/5 fruity-pebbles.c Fri Mar 27 13:48 still logged in deborah pts/5 grape-nuts.ccs.n Fri Mar 27 11:43 - 11:53 (00:09) barbara pts/3 152.148.23.66 Fri Mar 27 10:48 - 13:20 (02:31) jerry pts/3 nat16.aspentec.c Fri Mar 27 09:24 - 09:26 (00:01) You'll notice that the hostnames (column 3) in our last output have truncated names. We've seen this hostname length restriction before in Chapter 9, "Log Files", but up until now we've sidestepped the challenge it represents. We'll stare danger right in the face in a moment when we start populating our data structure. Early on in the while loop, we try to skip lines that contain cases we don't care about. In general it is a good idea to check for special cases like this at the beginning of your loops before any actual processing of the data (e.g., a split( )) takes place. This lets the program quickly identify when it can skip a particular line and continue reading input:
Now let's take a look at the individual subroutines that handle the tricky parts of this program. Our first subroutine, &domain( ), takes a Fully Qualified Domain Name (FQDN), i.e., a hostname with the full domain name attached, and returns its best guess at the domain name of that host. It has to be a little smart for two reasons:
Back to the code:
This next subroutine, short as it is, encapsulates the hardest part of this program. Our &AddToInfo( ) subroutine has to deal with truncated hostnames and the storing of information into our hash table. We're going to use a substring matching technique that you may find useful in other contexts. In this case, we'd really like all of the following domain names to be treated and stored as the same domain name in our array of unique domains for a user:
When the uniqueness of a domain name is in question, we check three things:
If any of these are the case, we don't need to add a new entry to our data structure because we already have a substring equivalent stored in the user's domain list. If case #3 is true, we'll want to replace the stored data entry with our current entry, assuring we've stored the largest string possible. Astute readers will also note that cases #1 and #2 can be checked simultaneously since an exact match is equivalent to a substring match where all the characters match. If all of these cases are false, we do need to store the new entry. Let's take a look at the code first and then talk about how it works:
@{$userinfo{$user}} returns the list of domains we've stored for the specified user. We iterate over each item in this list to see if $dn can be found in any item. If it can, we have a substring equivalent already stored, so we exit the subroutine. If we pass this test, we look for case #3 above. Each entry in the list is checked to see if it can be found in our current domain. If it is a match, we overwrite the list entry with the current domain data, thus storing the larger of the two strings. This happens even when there is an exact match, since it does no harm. We overwrite the entry using a special property of the for and foreach Perl operators. Assigning to $_ in the middle of a for loop like this actually assigns to the current element of the list at that point in the loop. The loop variable becomes an alias for the list variable. If we've made this swap, we can leave the subroutine. If we pass all three tests, then the final line adds the domain name in question to the user's domain list. That's it for the gory details of iterating over the file and building our data structure. To wrap this program up, let's run through all of the users we found and check how many unique domains each has logged into (i.e., the size of the list we've stored for each). For those entries that have more domains than our comfort level, we print the contents of their entry:
Now that you've seen the code, you might wonder if this approach really works. Here's some real sample output of our program for a user who had her password sniffed at another site: username has logged in from: 38.254.131 bu.edu ccs.neu.ed dac.neu.ed hials.no ipt.a tnt1.bos1 tnt1.bost tnt1.dia tnt2.bos tnt3.bos tnt4.bo toronto4.di Some of these entries look normal for a user in the Boston area. However, the toronto4.di entry is a bit suspect and the hials.no site is in Norway. Busted! This program could be further refined to include the element of time or correlations with another log file like that from tcpwrappers. But as you can see, pattern detection is often very useful by itself. ![]() Copyright © 2001 O'Reilly & Associates. All rights reserved. |
|
|