20.3 Detecting Changes After the Fact

As we saw in the last section, there may be circumstances in which we cannot use read-only media to protect files and directories. Or, we may have a case in which some of the important files are relatively volatile and need to change on a regular basis. In cases such as these, we want to be able to detect whether unauthorized changes occur.

There are basically three approaches to detecting changes to files and inodes:

Use comparison copies of the data to be monitored. This is the most reliable way.
Monitor metadata about the items to be protected. This includes monitoring the modification time of entries as kept by the operating system, and monitoring any logs or audit trails that show alterations to files.
Use some form of signature of the data to be monitored, and periodically recompute and compare the signature against a stored value.

Each of these approaches has drawbacks and benefits, as we discuss in the following sections. But before we explain them in detail, we need to explain a fundamental problem common to all of these schemes.

20.3.1 The Achilles Heel of Integrity Management Systems

The remainder of this chapter describes several different integrity management systems. All of these systems perform more or less the same function: they examine files on a computer's disk drive to determine whether the files have been changed in any significant way.

Although there are many reasons that you might want to examine the integrity of your system's files, one of the most common is to determine what has changed after a computer has been attacked, broken into, and compromised.

If you suspect that a system has been compromised, there are many ways that you can examine its files for evidence of this fact:

Physically remove the hard disk from the computer in question, attach the disk to a second computer as an auxiliary disk, boot the second computer, mount the disk read-only, and use the second computer's operating system to examine the disk. (For extra credit, you can use a tool like dd on the second computer to make a block-for-block copy of the [unmounted] disk in question on a spare drive. This will minimize the chance that the drive might be inadvertently modified as part of the analysis process.)
Leave the suspect disk in the suspect computer, but boot the suspect computer with a clean operating system from a CD-ROM or a floppy disk. Then, using only the tools on the CD-ROM or floppy, you could proceed to mount the suspect disk read-only and analyze the possibly compromised filesystem.
Log into the suspect computer and run whatever integrity-checking tools happen to be installed.
Try to determine what hole the attacker used, close it, and continue operations as normal.

Clearly, the most thorough way to examine the suspect system is the first technique. In practice, the third and fourth techniques are the most common. And to all of the people who have simply treated the symptoms of a compromised system, rather than taken a more thorough approach, we have one question:

Which part of the word "compromised" do you not understand?

If an attacker truly compromises your computer system, all bets are off. Nothing should be trusted. It is possible that the attacker has done nothing to affect the integrity of critical system programs such as login, ps, ls, and netstat. On the other hand, it is possible that the attacker has replaced all of these programs with modified programs that contain Trojan horses and back doors, and then modified your computer's kernel so that integrity-checking tools cannot tell the difference!^[7]

^[7] Such kernel modifications are quite difficult to write, but now that they have been written they are readily available to all attackers who have an Internet account. One particularly elegant modification alters the computer's filesystem so that one version of the program /bin/login is returned when the open( ) system call is employed, but another version of the program is returned when the exec( ) system call is used. This assures that integrity-checking programs will not catch any modifications, yet the Trojan horse programs will continue to run.

Sadly, it takes a lot of extra time to do things the right way. It's much easier to log into a suspect computer and run a copy of Tripwire or AIDE to check for modifications—rather than going to the trouble of booting a kernel from CD-ROM that is known to be good. That's why many people—the authors included—will occasionally run automated tools on possibly compromised machines before breaking out the CD-ROMs and the screwdrivers. But beware: if it looks like nothing is wrong, everything could be wrong.

20.3.2 Comparison Copies

The most direct and assured method of detecting changes to data is to keep a copy of the unaltered data and do a byte-by-byte comparison when needed. If there is a difference, this indicates not only that a change occurred, but explains what that change involved. There is no more reliable and complete method of detecting changes.

Comparison copies, however, are unwieldy. They require that you keep copies of every file of interest. Not only does such a method require twice as much storage as the original files, it also may involve a violation of the licenses or copyrights of the files. (Copyright law allows one copy for archival purposes, and your distribution media is that one copy.)^[8] To use a comparison copy means that both the original and the copy must be read through, byte by byte, each time a check is made. And, of course, the comparison copy needs to be saved in a protected location.

^[8] Copyright laws—and many licenses—do not allow for copies on backups.

Even with these drawbacks, comparison copies have a particular benefit: if you discover an unauthorized change, you can simply replace the altered version with the saved comparison copy, thus restoring the system to normal. These copies can be made locally, at remote sites, or over the network, as we describe in the following sections.

20.3.2.1 Local copies

One standard method of storing comparison copies is to put them on another disk. Many people report success with storing copies of critical system files on removable media drives.^[9] If there is any question about a particular file, the appropriate disk is placed in the drive, mounted, and compared. If you are careful about how you configure these disks, you get the added (and valuable) benefit of having a known good version of the system to boot up if the system is compromised by accident or attack. Making regular backups to removable or write-once media such as tapes and CDs can provide similar benefits.

^[9] Note that an external Firewire-based disk drive fits this description.

A second standard method of storing comparison copies is to make on-disk copies somewhere else on the system. For instance, you might keep a copy of /bin/login in /usr/adm/.hidden/.bin/login. Furthermore, you can compress and/or encrypt the copy to help reduce disk use and keep it safe from tampering; if an attacker were to alter both the original /bin/login and the copy, then any comparison you made would show no change. The disadvantage to compression and encryption is that it then requires extra processing to recover the files if you want to compare them against the working copies. This extra effort may be significant if you wish to do comparisons daily (or more often!). If you make these copies in single-user mode and mark them as immutable (as described earlier), you prevent them from being altered or removed by an attacker.

20.3.2.2 Remote copies

A third method of using comparison copies is to store them on a remote site and make them available remotely in some manner. For instance, you might place copies of all the system files on a disk partition on a secured server, and export that partition read-only using NFS or some similar protocol. All the client hosts could then mount that partition and use the copies in local comparisons. Of course, you need to ensure that whatever programs used in the comparison (e.g., cmp, find, and diff) are taken from the remote partition and not from the local disk. Otherwise, an attacker could modify those files to not report changes!

Remember that it is not enough to keep copies of executable programs. Shared libraries and configuration files must usually be compared as well.

20.3.2.3 rdist

Another method of remote comparison involves using a program to do the comparison across the network. The rdist utility is one such program that works well in this context. The drawback to using rdist, however, is the same as with using full comparison copies: you need to read both versions of each file, byte by byte. The problem is compounded, however, because you need to transfer one copy of each file across the network each time you perform a check. (If you use rdist, always use it with the options -P ssh rather than relying on the Berkeley "r" commands.)

One scenario that works well with rdist is to have a "master" configuration for each architecture you support at your site. This master machine should not generally support user accounts, and it should have extra security measures in place. On this machine, you put your master software copies, possibly installed on read-only disks.

Periodically, the master machine copies a clean copy of the rdist binary to the client machine to be checked. The master machine then initiates an rdist session involving the -b option (byte-by-byte compare) against the client. Differences are reported or, optionally, fixed. In this manner, you can scan and correct dozens or hundreds of machines automatically. If you use the -R option, you can also check for new files or directories that are not supposed to be present on the client machine.

The normal mode of operation of rdist, without the -b option, does not do a byte-by-byte compare. Instead, it compares only the metadata in the inode concerning times and file sizes. As we discuss in the next section, this information can be spoofed.

An rdist master machine has other advantages. It makes it much easier to install new and updated software on a large set of client machines. This feature is especially helpful when you are in a rush to install the latest security patch in software on every one of your machines. It also provides a way to ensure that the owners and modes of system files are set correctly on all the clients. The downside of this is that if you are not careful, and an attacker modifies your master machine, rdist will just as efficiently install the same security hole on every one of your clients automatically!

20.3.3 Checklists and Metadata

Saving an extra copy of each critical file and performing a byte-by-byte comparison can be unduly expensive. It requires substantial disk space to store the copies. Furthermore, if the comparison is performed over the network, either via rdist or NFS, it will involve substantial disk and network overhead each time the comparisons are made.

A more efficient approach would be to store a summary of important characteristics of each file and directory. When the time comes to do a comparison, the characteristics are regenerated and compared with the saved information. If the characteristics are comprehensive and smaller than the file contents (on average), then this method is clearly a more efficient way of doing the comparison.

Furthermore, this approach can capture changes that a simple comparison copy cannot: comparison copies detect changes in the contents of files, but do little to detect changes in metadata such as file owners or protection modes. It is this data—the data normally kept in the inodes of files and directories—that is sometimes more important than the data within the files themselves. For instance, changes in owner or protection bits may result in disaster if they occur to the wrong file or directory.

Thus, we would like to compare the values in the inodes of critical files and directories with a database of comparison values. The values we wish to compare and monitor for critical changes are owner, group, and protection modes. We also wish to monitor the mtime (modification time) and the file size to determine if the file contents change in an unauthorized or unexpected manner. We may also wish to monitor the link count, inode number, and ctime as additional indicators of change. All of this material can be listed with the ls command.

20.3.3.1 Simple listing

The simplest form of a checklist mechanism is to run the ls command on a list of files and compare the output against a saved version. The most primitive approach might be a shell script such as this:

#!/bin/sh

cat /usr/adm/filelist | xargs ls -ild > /tmp/now
diff -b /usr/adm/savelist /tmp/now

The file /usr/adm/filelist would contain a list of files to be monitored. The /usr/adm/savelist file would contain a base listing of the same files, generated on a known secure version of the system. The -i option adds the inode number in the listing. The -d option includes directory properties, rather than contents, if the entry is a directory name.

This approach has some drawbacks. First of all, the output does not contain all of the information we might want to monitor. A more complete listing can be obtained by using the find command:

#!/bin/sh

find `cat /usr/adm/filelist` -ls > /tmp/now
diff -b /usr/adm/savelist /tmp/now

This will not only give us the data to compare on the entries, but it will also disclose if files have been deleted or added to any of the monitored directories.

Writing a script to perform this operation and running it periodically from a cron file may seem tempting. The difficulty with this approach is that an attacker may modify the cron entry or the script itself to not report any changes. Thus, be cautious if you take this approach and be sure to review and then execute the script manually on a regular basis.

20.3.3.2 Ancestor directories

You must be sure to check the ancestor directories of all critical files and directories—i.e., all the directories between the root directory and the files being monitored. These are often overlooked, but can present a significant problem if their owners or permissions are altered. An attacker might then be able to rename one of the directories and install a replacement or a symbolic link to a replacement that contains dangerous information. For instance, if the /etc directory is set to mode 777, then anyone could temporarily rename the password file, install a replacement containing a root entry with no password, run su, and reinstall the old password file. Any commands or scripts you have that monitor the password file would show no change unless they happen to run during the few seconds of the actual attack—something the attacker can usually avoid.

The following script takes a list of absolute file pathnames, determines the names of all of them that contain directories, and then prints them:

#!/bin/ksh

typeset pdir

function getdir      # Gets the real, physical pathname
{
   if [[ $1 != /* ]]
   then
      print -u2 "$1 is not an absolute pathname"
      return 1
   elif cd "${1%/*}"
   then
      pdir=$(pwd -P)
      cd ~-
   else
      print -u2 "Unable to attach to directory of $1"
      return 2
   fi
   return 0
}

cd /
print /     # Ensure we always have the root directory included

while read name
do
   getdir $name || continue
   while [[ -n $pdir ]]
   do
      print $pdir
      pdir=${pdir%/*}
   done
done | sort -u

20.3.4 Checksums and Signatures

Unfortunately, the approach we described for monitoring files can be defeated with a little effort. Files can be modified in such a way that the information we monitor will not disclose the change. For instance, a file might be modified by writing to the raw disk device after the appropriate block is known. As the modification did not go through the filesystem, none of the information in the inodes will be altered.

An attacker could also surreptitiously alter a file by setting the system clock back to the time of the last legitimate change, making the edits, and then setting the clock forward again. If this is done quickly enough, no one will notice the change. Furthermore, all the times on the file (including the ctime) will be set to the "correct" values. Several so-called "rootkits" in widespread use on the Internet actually take this approach. It is easier and safer than writing to the raw device. It is also more portable.

Thus, we need to have some stronger approach in place to check the contents of files against a known good value. Obviously, we could use comparison copies, but we have already noted that they are expensive. A second approach would be to create a signature of the file's contents to determine if a change occurred.

The first, naive approach using such a signature might involve the use of a standard CRC checksum, as implemented by the sum command. CRC polynomials are often used to detect changes in message transmissions, so they could logically be applied here. However, this application would be a mistake.

CRC checksums are designed to detect random bit changes, not purposeful alterations. As such, CRC checksums are good at finding a few bits changed at random. However, because they are generated with well-known polynomials, an attacker can alter the input file to generate an arbitrary CRC polynomial after an edit. In fact, some of the same attacker toolkits that allow files to be changed without altering the time also contain code to set the file contents to generate the same sum outputs for the altered file as for the original. These tools have been generally available since at least 1992.

To generate a checksum that cannot be easily spoofed, we need to use a stronger mechanism, such as the message digests described in Section 7.4. These are also dependent on the contents of the file, but they are too difficult to spoof after changes have been made.

If we had a program to generate the MD5 checksum of a file, we might alter our checklist script to be:

#!/bin/sh

find 'cat /usr/adm/filelist' -ls -type f -exec md5 {}\; > /tmp/now
diff -b /usr/adm/savelist /tmp/now

Both the mtree command and the Tripwire system (discussed later in this chapter) employ cryptographic checksums for this purpose.