26.2. Monitoring Your SystemAnother important aspect of firewall maintenance involves monitoring your system. Monitoring is intended to tell you several things:
26.2.1. Special-Purpose Monitoring DevicesYou'll do most of your monitoring using the tools and the logging provided by the existing parts of your firewall, but you may find it convenient to have some dedicated monitoring devices as well. For example, you may want to put a monitoring station on your perimeter net so you can be sure only the packets you expect are going across it. You can use a general-purpose computer with network snooping software on it, or you can use a special-purpose network sniffer.
How can you make certain that this monitoring machine can't be used by an intruder? In fact, you'd prefer that an intruder not even detect its existence. On some network hardware, you can disable transmission in the network interface (with sufficient expertise and a pair of wire cutters), which will make the machine impossible to detect and extremely difficult for an intruder to use (because it can't reply). If you have the source for your operating system, you can always disable transmission there; however, in this case, it's much harder to be certain you've been successful. In most cases, you'll have to settle for extremely cautious configuration of the machine. Treat it like a bastion host that needs to do less and be more secure.
In particular, you should note that Microsoft's Network Monitor registers a special NetBIOS name when it starts up. It is a blatant advertisement of the fact that you are monitoring the network. The only way to stop it from happening is to unbind NetBIOS from the network interface you're monitoring, which will also make it impossible to use the Network Monitor Agent on it (not to mention most other Microsoft-native network applications!).
Other forms of network sniffing are more subtle, but also often detectable. Most obviously, network sniffers tend to do a lot of name service requests, in order to find hostnames for the IP addresses. Less obviously, a machine that is accepting all packets often slightly changes the way it handles incoming requests. A group called The L0pht (pronounced "loft") released an anti-sniffer-sniffer to detect network sniffers, which has led to an arms race, with people developing less and less detectable sniffers, not to mention an anti-sniffer-sniffer-sniffer. These technologies are rapidly evolving; we advise that you deploy the most up-to-date, least detectable sniffer technology available and an anti-sniffer-sniffer on critical networks to help you find sniffers installed by attackers. (Information about getting The L0pht's anti-sniffer-sniffer can be found in Appendix B, "Tools".)
Clever attackers will assume that you are intercepting packets and will take steps to conceal traffic, whether or not they can actually find your monitoring system. There's nothing you can do but stay alert.
26.2.2. Intrusion Detection SystemsAn intrusion detection system is a piece of special-purpose software designed to figure out if somebody has broken into your site. Intrusion detection systems can range from relatively simple, passive programs that read log files and look for bad things, to extremely complex systems that use special-purpose monitoring devices spread out across a large network, inject fake traffic into the network to see if anybody uses the information, and/or employ sophisticated artificial intelligence techniques to detect subtle forms of abnormal behavior.
Intrusion detection is a big subject, and we can't cover it fully here. More information can be found in other places (for instance, Stephen Northcutt's Network Intrusion Detection: An Analyst's Handbook, New Riders, 1999).
There are two basic techniques for intrusion detection; either systems can know what kind of behavior is bad and set off alarms when it happens, or they can know what kind of behavior is good and set off alarms when anything else happens. Relatively speaking, it's easier to recognize bad behavior than good behavior. Unfortunately, it's more effective to recognize good behavior.
Systems that recognize bad behavior use attack signatures, information about what particular attacks look like. For instance, this sort of system would recognize a port scan as an attack because it would know that a series of attempts to contact different ports on the same host was a sign of an attack.
Systems that recognize good behavior use usage profiles, information about what normally happens. For instance, this sort of system would recognize a port scan as an attack because it would know that normally, when somebody connects to a port, the next thing that happens is that they use that port for its normal purpose. In a port scan, the attacker will connect, get a response, and immediately disconnect and try another port. This behavior is outside of the normal pattern and therefore will be detected as an attack.
The difficulty with recognizing attack signatures is that the system can detect only attacks that it knows about. When new attacks are created, the system won't know about them until a signature is created and added. In addition, it's often possible to disguise signatures. For instance, the signatures for port scans used to look for multiple connections to different ports from the same source host, so attackers now use multiple collaborating source hosts.
Systems that rely on usage profiles have problems with what are called "false positives", or cases where they think an attack is occurring but it isn't. Usage of systems changes over time, and any profile that's specific enough to catch any significant number of attacks will set off a large number of alarms. Good systems now have false positive rates in the range of about 1-3 percent; unfortunately, that's 1-3 percent of the events they look at, which in the case of a network usually means 1-3 percent of packets. Since many sites have millions of incoming packets a day, this apparently small error rate can still result in thousands of false alarms a day, which is rarely acceptable.
Intruders also have ways of hiding attacks that will defeat almost all intrusion detection systems. For instance, a patient intruder can scan a network very slowly; most intrusion detection systems look at a few minutes', or maybe a few hours', worth of context at a time. An impatient intruder can bury an attack in a large amount of network traffic, and few systems will be able to keep up.
Similarly, some techniques can be used that defeat almost all attempts at hiding attacks. The most powerful and popular of these is the honeypot, the tempting bait with nothing but a trap behind it. For instance, if you put a machine on your perimeter network that you don't use for any services, you know that any attempt to connect to it is an attack. It doesn't matter whether or not it matches an attack signature, or whether or not it fits a normal usage pattern. It's just obviously wrong.
How much an intrusion detection system can do for you depends mostly on how much time, money, and development effort you can invest in the system. Although intrusion detection is theoretically a very effective technology, actually making it work in practice is not an easy proposition, and it requires constant maintenance and attention. There is no point in having an intrusion detection system unless you have the personnel to keep it up to date and to deal with the alarms that it produces.
26.2.3. What Should You Watch For?In a perfect world, you'd like to know absolutely everything that goes through your firewall -- every packet dropped or accepted and every connection requested. In the real world, neither the firewall nor your brain can cope with that much information. To come up with a practical compromise, you'll want to turn on the most verbose logging that doesn't slow down your machines too much and doesn't fill up your disks too fast; then, you'll want to summarize the logs that are produced.
You can improve the disk space problem by keeping verbose logs offline, on some form of removable media (for instance, tapes, writable CDs, or writable DVDs). Tapes are cheap and hold a lot of data, but they have some drawbacks. They're not particularly fast under the best circumstances, and log entries are generally too short to achieve maximum performance. They're also annoying to read data from. If you're interested in using them, write summary logs to disk and write everything to tape. If you find a situation where you need more data, you can go back to the tape for it. A tape drive can probably keep up with the packets on an average Internet connection, but it won't keep up with an internal connection at full LAN speeds or even with a T-1 connection to the Internet that's at close to its maximum performance. CD and DVD writers are even slower, but they're much easier to read data from. If you have large amounts of disk space to use as temporary storage, they may be an effective solution.
No matter what you are using to write logs to, you should protect the logs. The data that's in them may be useful to attackers, and it may be confidential for other reasons. For instance, if you log the contents of packets, you may well be logging encrypted sensitive information. Even if you don't log packet contents, the information about what packets went where may be private; it's one thing to log the 434 times that somebody tried to go to an embarrassing web site, and another to have it become public knowledge.
No matter how you're storing logs, you want to log the following cases:
What are you watching for? You want to know what your usual pattern is (and what trends there are in it), and you want to be alerted to any exceptions to that pattern. To recognize when things are going wrong, you have to understand what happens when things are going right. It's important to know what messages you get when everything is working. Most systems produce error messages that sound peculiar and threatening even when they're working perfectly well. For example, in the sample syslog output in Example 26-1, messages 10, 14, and 17 all look vaguely threatening, but are in fact perfectly OK. (Although these examples are taken from a Unix syslog, exactly the same phenomena can be seen in the Windows NT Event Log; information about setting up logging can be found in Chapter 10, "Bastion Hosts", Chapter 11, "Unix and Linux Bastion Hosts", and Chapter 12, "Windows NT and Windows 2000 Bastion Hosts ".)
Message 10 is a common network failure that will result in a retry, and how good do you expect your connection to Cameroon to be? 14 is traceroute running. 17 says there are no synonyms defined, which you presumably already know.If you see those messages for the first time when you're trying to debug a problem, you're likely to leap to the conclusion that the messages have something to do with your problem and get thoroughly sidetracked. Even if you never do figure out what the messages are and why they're appearing, just knowing that certain messages appear even when things are working fine will save you time.
Example 26-1. A Sample syslog File (Line Numbers Added)
1: May 29 00:00:58 localhost wn: noc.nca.or.bv - -  "GET /long/consulting.html HTTP/1.0" 200 1074 <Sent file: > 2: May 29 00:00:58 localhost wn: <User_Agent: Mozilla/1.0N (X11; SunOS 4.1.3-KL sun4m)> <Referrer: http://www.longitude.example/> 3: May 29 00:02:38 localhost ftpd: 26086: 05/29/95 0:02:38 spoke.cst.cnes.vg(gupta@) retrieved /pub/firewalls/digest/v04.n278.Z(15788 bytes) 4: May 29 00:15:57 localhost ftpd: 27195: 05/29/95 0:01:52 client42.sct.io connected, duration 845 seconds 5: May 29 00:18:04 localhost ftpd: 26086: 05/29/95 23:26:32 spoke.cst.cnes.vg connected, duration 3092 seconds 6: May 27 01:13:38 mv-gw.longitude.example user: host naismith.longitude.com admin login failed 7: May 27 01:13:47 mv-gw.longitude.example last message repeated 2 times 8: May 27 01:15:17 mv-gw.longitude.example user: host naismith.longitude.example admin login succeeded 9: May 27 01:19:18 mv-gw.longitude.example 16 permit: TCP from 192.168.20.35.2591 to 172.16.1.3.53 seq 324EE800, ack 0x0, win 4096, SYN 10: May 29 02:20:09 naismith sendmail: CAA27366: SYSERR(root): collect: I/O error on connection from atx.eb.cm, from=<<Mailer- Daemon@eb.cm>: Connection reset by peer during collect with atx.eb.cm 11: May 29 02:30:28 naismith named: sysquery: server name mismatch for [172.16.8.25]: (sun.nhs-relay.ac.cv != nhs-relay.ac.cv) (server for cus.ox.ac.cv) 12: May 29 02:31:00 naismith named: sysquery: server name mismatch for [172.16.8.25]: (nhs-relay.ac.cv != sun.nhs-relay.ac.cv) (server for PANSY.CSV.WARWICK.AC.CV) 13: May 29 02:47:04 naismith named: sysquery: server name mismatch for [172.16.8.25]: (nhs-relay.ac.cv != sun.nhs-relay.ac.cv) (server for LUPUS.CNS.UMIST.AC.CV) 14: May 29 07:50:59 mv-gw.longitude.example 8 deny: UDP from 192.168.69.250.33072 to 192.168.20.34.33467 15: May 29 08:06:16 naismith popper: (v1.831beta) Servicing request from "penta.longitude.example" at 192.168.20.36 16: May 29 08:06:56 naismith popper: (v1.831beta) Ending request from "penta.longitude.com" at 192.168.20.36 17: May 29 10:04:02 localhost waisserver1: -2: Warning: couldn't open wais-sources/firewalls-digest.syn - synonym translation disabled 18: May 29 16:26:46 mv-gw.longitude.example 8 deny: UDP from 192.168.186.11.20 to 192.168.20.34.1937
As you can see, the log file is verbose and not particularly readable (even with better linebreaks inserted). An unimportant error condition on a distant host (the server name mismatch on nhs-relay.ac.cv) is producing multiple error messages (11, 12, and 13, in this highly condensed version). The log file is also in chronological order which is not particularly the order of importance.
Example 26-2 shows a report based on a log file, with messages arranged in a more useful order and somewhat summarized.
Example 26-2. A Report Based on a syslog File
May 27 06:42:07 localhost ftpd: securityalert: refused passwd file to email@example.com from chen.dialup.zarf.net ay 27 06:42:10 localhost ftpd: securityalert: refused passwd file to firstname.lastname@example.org from chen.dialup.zarf.net ----------------------------------------------------------------- ay 26 12:33:39 localhost su: nxn to root on /dev/ttyp1 ay 27 01:23:17 naismith su: bart to root on /dev/ttyp3 ----------------------------------------------------------------- ay 26 12:29:44 naismith kernel: uid 31 on /naismith_b: file system full ay 26 12:31:33 naismith kernel: uid 31 on /naismith_b: file system full ----------------------------------------------------------------- ay 26 02:49:03 naismith named: Malformed response from [192.168.192.2].53 (ran out of data in answer) ----------------------------------------------------------------- ay 26 12:14:36 mv-gw.longitude.example 16 deny: UDP from 192.168.69.1.58899 to 192.168.20.35.33459 ay 26 12:15:15 mv-gw.longitude.example 16 deny: UDP from 192.168.69.1.58962 to 192.168.20.35.33459 ay 27 01:24:05 mv-gw.longitude.example 16 permit: TCP from 192.168.20.34.2637 to 192.168.54.72.23 seq BE793A01, ack 0x0, win 4096, SYN ay 27 01:24:11 mv-gw.longitude.example 16 permit: TCP from 192.168.20.34.2637 to 192.168.54.72.23 seq BE793A01, ack 0x0, win 4096, SYN ----------------------------------------------------------------- FTP: Connections: 240 Files: 733 Bytes: 32,747,429 (31.23 M) Seconds: 92,787 (25.77 hours)
Log messages fall into three categories:
Log entries often must be considered in context. A message that's mildly mysterious if it occurs once is cause for serious worry if it occurs every minute. For example, "login succeeded for user smith" is good, unless it's preceded by three "login failed" messages for every user above "smith" in your password file; in that case, it's very bad indeed. In Example 26-1, message 9 shows an unexceptional outbound TCP connection, logged just on general principles. It wouldn't be at all worrying if it weren't preceded by messages 6 through 8. In context, you know that someone made three failed tries at logging in as "admin", finally succeeded, and then immediately started an outbound connection. This looks extremely suspicious. Message 7 doesn't mean anything at all without context.
In a large system, getting context may require correlating log files from multiple hosts. This is one reason for keeping consistent time settings; it is also a reason why people use intrusion detection systems. If you have high volumes of traffic, a complex firewall, or a requirement for strict security, you will probably need an intrusion detection system to help you with the log analysis.
26.2.4. The Good, the Bad, and the UglyOnce you go beyond the obvious (for example, it's OK for users to log in; it's not OK for the disk to be bad), how can you tell when you're in trouble? Some rules of thumb:
You should suspect that someone's been probing your site if you see:
26.2.5. Responding to ProbesInevitably, you're going to detect apparent probes of your firewall -- packets sent to services you don't offer to the Internet, attempts to log in to nonexistent accounts, and so on. Probes are the Internet equivalent of someone walking down a line of doors and checking every door knob to see if it's locked. Probers generally try one or two things, and if they don't get an interesting response, they move on. If you're inclined to do so, you can spend a lot of time chasing down such incidents, attempting to figure out where the probes are coming from and who is behind them. However, in most situations, it probably isn't worth the effort. The novelty of chasing down probes of this kind fades quickly. If you're getting persistent probes from some site, you might contact the management of that site to let it know what's going on (in general, it has been broken into and needs to know), but that's usually about as far as folks need to go in responding to these probes.
It's unfortunate that on the Internet today, probes are so frequent that the laissez faire attitude we've described is often an appropriate one. In good neighborhoods, people don't get away with trying door knobs. You have a right to be unhappy with people who behave this way and trying to get them to stop is perfectly reasonable. However, you do need to decide where you're going to spend your energy. Save extreme responses for extreme situations. Treating probers with maximum harshness is just going to convince people that you are unreasonable.
Some people amuse themselves by setting up firewall machines to lead on people who try common probes. For example, they put a password file in the anonymous FTP area that appears to contain user account data. However, if the prober breaks the encrypted passwords, he or she sees a snide message. This is a harmless way to spend your spare time and provides a satisfactory feeling of revenge, but it doesn't actually improve your security much. It simply annoys attackers, and doing so may cause them to take a personal interest in breaking into your site.
Some people and some firewalls prefer a more active approach to probes, engaging in counter-attacks. A fairly wide variety of things are sold as counter-attacks (firewall marketers find the concept extremely sexy). They range from attempts to find out further information about where the probes are coming from (giving you a start on tracking them down), to automatically configuring the firewall to reject all connections from the probing site, to actual counter-attacks. Regardless of their often remarkably aggressive marketing material, no commercial firewall does anything that can reasonably be described as a counter-attack, for obvious reasons involving liability. This is good because misidentification, loops, and bad feelings often result from even mild automated responses.
For instance, a forged probe purporting to be from a site that does information-gathering when it gets probes and aimed at another such site is almost guaranteed to suck the site administrators into a "Yeah, but you hit me first!" argument. Why would somebody forge such a packet? Because they did something and were annoyed at the response they got. A little forgery can bring a lot of community pressure to bear on somebody who has an overly vigorous probe response. A notable example occurred with somebody whose response to anonymous FTP attempts was to send mail threatening to sue; a widely broadcast email contained a URL that attempted anonymous FTP to the site, causing both resource problems and publicity problems.
Different sites have different opinions about what constitutes a probe, and what constitutes a full-fledged attack. Most people call something a probe as long as they know it's not going to work, even if it is determined and drawn out. For example, somebody who determinedly tries every possible combination of lowercase alphabetic characters as your root password is not going to succeed and can probably be ignored as a probe until you get tired of reading the log messages. (That kind of attack won't succeed, no matter how many combinations are tried.) However, if you have the time and the energy, it's probably worth pursuing people who are making determined attempts, even when you know they'll fail.
26.2.6. Responding to AttacksIf your logs show that someone is making a determined attack against your system (see the rules of thumb we presented in Section 26.2.4, "The Good, the Bad, and the Ugly", earlier in this chapter), you probably want to do a little more than sit back and watch. Chapter 27, "Responding to Security Incidents", describes in detail how you should respond to a real security incident.
Copyright © 2002 O'Reilly & Associates. All rights reserved.