Chapter 27. Responding to Security Incidents

The CERT Coordination Center (CERT-CC) reports that, despite increased awareness, the first time many organizations start thinking about how to handle a computer security incident is after an intrusion has occurred. Obviously, this isn't a great approach. You need a plan for how you're going to respond to a computer security incident at your site, and you need to develop that plan well before an incident occurs.

There isn't room here to detail everything you need to know to deal with a security incident: attacks are many and varied and change constantly; responding to them can involve a byzantine assortment of legal and technical issues. This chapter is intended to give you an outline of the issues involved and the practical steps you can take ahead of time to smooth the process. Appendix A, "Resources", provides a list of resources that may provide additional help.

27.1. Responding to an Incident

This section discusses a number of steps you'll need to take when you respond to a security incident. You won't necessarily need to follow these steps in the order they're given, and not all of these steps are appropriate for all incidents. But, we recommend that you at least contemplate each of them when you find yourself dealing with an incident.

In Section 27.4, "Planning Your Response", later in this chapter, we'll look again at each of these steps and help you figure out how to work them into the overall response plan that you should develop before an incident actually occurs.

Rules for Incident Response
In their book Practical UNIX & Internet Security, Simson Garfinkel and Gene Spafford provide two excellent, overriding rules for incident response. Keep these rules in mind as you read this chapter and during any real-life incident response:

Rule 1: Don't Panic!
Rule 2: Document!

27.1.1. Evaluate the Situation

The first step in responding to a security incident is to decide what response, if any, needs to be made immediately. Ask these questions:

Has an attacker succeeded in getting into your systems?

If so, you have a genuine emergency on your hands, whether or not the attacker is currently active.

Is the attack currently in progress?

If so, you need to decide how you're going to react right now. If the attack isn't currently in progress, you may not be in such a hurry.

If the incident looks like an aggressive attack on your system, you probably want to take strong steps quickly. These steps might include shutting down the system or your Internet connection until you figure out how to deal with the situation.

On the other hand, if the incident is a less aggressive one -- perhaps someone has just opened a Telnet connection to your machine and is trying various login/password pairs -- then you may want to move more slowly. If you're reasonably confident that the attack won't succeed (e.g., you can see that the attacker is trying passwords that consist of all lowercase letters, and you know for certain that no account on the system has such a password), you might want to leave things alone and just watch for a while to see what the attacker does. This may give you an opportunity to trace the attack. (However, see the Section 27.3, "Pursuing and Capturing the Intruder" section, later in this chapter, for a discussion of the issues involved in tracing an attack.)

Whatever you do, remember Rule 1: Don't panic!

27.1.2. Start Documenting

As soon as you determine that you actually have a problem that you need to respond to, start documenting what's going on. You don't need to get fancy at this point (you don't have time to, until you've taken the next step), but you should at least start a log by making a note of what time it is.

27.1.3. Disconnect or Shut Down, as Appropriate

Once you've evaluated the situation, your next priority is to give yourself the time to respond without risking your systems further. The least disruptive alternative is usually to disconnect the affected machine from all networks; this will shut down any active connections. Shutting down active connections may make it harder to trace the intruder, but it will allow the rest of the people at your site to continue to do their work, and it will leave the intruder's programs running. This may help you to identify who the intruder might be.

If you're afraid that other machines have been compromised or are vulnerable to the same attack, you'll probably want to disconnect as many machines as you can as a unit. This may mean taking down your connection to the Internet, if possible. If your Internet connection is managed elsewhere in your organization, you may need to detach just your portion of the network, but you'll also need to talk to other parts of your organization as soon as possible to let them know what's happening.

In some situations, you may want to shut down the compromised system. However, this action should be a last resort for a number of reasons:

It destroys information you may need.
You won't be able to analyze or fix the machine while it's down; you'll have to disconnect it from the network eventually anyway to bring it back up again.
It's even more disruptive to legitimate users than removing the network connection.
It protects only one machine at a time. (It's much easier to cleanly disconnect a set of systems than to cleanly shut them down.)

Even if you're responding to an incident that has already ended, you still might want to disconnect or shut down the system, or at least close it to users, while you analyze what happened and make any changes necessary to keep it from happening again. This will keep you from being confused by things users are doing, and it will prevent the intruder from returning before you're done.

27.1.4. Analyze and Respond

Your next priority is to start to fix what's gone wrong. The first step in actually correcting the problem is to relax, think for a while, and make sure you really understand what's happening and what you're dealing with. The last thing you want to do is make the situation worse by doing something rash and ill considered. Whatever corrective actions you're contemplating, think them through carefully. Will they really solve the problem? Will they, in turn, cause other problems?

When you're working in an unusual, high-stress situation like this, the chances increase of making a major error. Because you're probably going to be working with system privileges (for example, working as root on a Unix system), the consequences of an error could be serious.

There are several ways you can reduce the chances of making an error. One good way is to work with a partner; each of you can check the other's commands after they're typed but before they're executed. Even if you're working alone, many people find that reading commands aloud and checking the arguments in reverse order before executing them helps avoid mistakes. Resist the temptation to try to work fast. You will go home sooner if you work slowly and carefully.

Try not to let your users get in the way of your response. You may want to give someone the specific job of dealing with user inquiries so the rest of your response team can concentrate on responding to the incident.

Also, try to keep your responders from tripping over each other. Make it clear which system managers and investigators are working on which task, so they won't step on each other's toes (or wind up unintentionally chasing each other as part of the investigation!).

27.1.5. Make "Incident in Progress" Notifications

You're not the only person who needs to know what's going on. A number of other people -- in a number of different places -- have to be kept informed.

27.1.5.1. Your own organization

Within your own organization are people who need to know that something is happening: management, users, and staff. At the very least, let them know that you are busy responding to an incident and that you may not be available to them for other matters. They usually need to know why they're being inconvenienced and what they should do to speed recovery (even if the only thing they can do is to go away and leave you alone).

It is particularly important that management and other staff know what's going on. Otherwise, you risk having them act in opposition to you. For instance, if you've disconnected the Internet connection, the chances are high that somebody's going to notice the service outage and try to fix it. That's a problem if it's another staff member, but it can be a disaster if it turns into a management requirement.

If people call management to complain about some side effect of your response, and the manager they get has been briefed about what's going on, the chances are that the manager will defend your need to make a response. At worst, the manager will make a reasoned decision about the importance of incident response versus other needs of the company. However, if the manager doesn't know what's going, he or she will probably respond the same way the manager would to any other network outage: "Gee, that's terrible, we'll fix it as soon as possible." The manager has then promised the user something, and the chances are very small that the manager will go back on that promise. Instead, your response will be curtailed by the need to restore service as soon as possible.

Depending on the nature of your site and the incident in question, you may also need to inform your legal, audit, public relations, and security departments. You will always want to contact the security department if:

You want to involve law enforcement agencies.
You suspect an insider is involved.
You suspect physical access is involved.

If multiple computer facilities are at your site, you'll need to inform the other facilities as soon as possible; they are likely sources and future targets for similar attacks.

27.1.5.2. CERT-CC or other incident response teams

If your organization is served by an incident response team such as CERT-CC, or has its own such team, let them know what's going on and try to enlist their aid. (For instructions on how to contact CERT-CC or another response team, see Appendix A, "Resources".) What steps response teams can take to help you will depend on the charter and resources of the response team. Even if they can't help you directly, they can tell you whether the attack on your site looks as if it is part of a larger pattern of incidents. In that case, they may be able to coordinate your response with the responses of other sites.

27.1.5.3. Vendors and service providers

You might want to get in touch with your vendor support contacts or your Internet service provider(s) if you think they might be able to help or should be aware of the situation. For example, if the attackers appear to be exploiting an operating system bug, you should probably contact the vendor to see if they know about it and have a fix for it. At the very least, they'll be able to warn other sites about the bug. Similarly, your Internet provider is unlikely to be able to do much about your immediate problem, but they may be able to warn other customers. There is also a possibility that your Internet provider has itself been compromised, in which case, they need to know immediately. Your vendors and service provider may have special contacts or procedures for security incidents that will yield much faster results than going through normal support channels.

You may get little or no visible response when you make these reports. This might be because you're being ignored or because companies are putting self-defense before the interests of their customers. On the other hand, it's often due to sensible precautions that are intended to make certain that problems are not publicized before fixes are available (jeopardizing places not yet under attack), that the fixes that are made are appropriate to the problem, and that attackers don't get valuable information by pretending to be sites under attack. You might as well give your suppliers the benefit of the doubt, since it's almost impossible to tell which of these is going on.

27.1.5.4. Other sites

Finally, if the incident appears to involve other sites -- that is, if the attack appears to be coming from a particular site, or if it looks as if the attackers have gone after that site after breaking into yours -- you should inform those other sites. These sites are usually easy to identify as the sources or destinations of connections. It's often much harder to figure out how to find an actual human being with some responsibility for the computer in question, who is awake and reachable and has a common language with you.

Once again, you may get little or no apparent response for any number of different reasons, some of them annoying and reprehensible, and some of them perfectly sensible. The other site may not care whether their users are attacking you, or they may care desperately but have no way of telling you about it without revealing information to the attackers. While it's always nice to get somebody who makes an immediate, visibly effective response and thanks you promptly for the information, don't expect it and don't be upset when you don't get it.

If you don't know who to inform, talk to your response team (or CERT-CC). They will probably either know or know how to find out, and they have experience in calling strangers to tell them they have security problems.

27.1.6. Snapshot the System

Another early step to take is to make a "snapshot" of each compromised system. You might do so by doing a full backup to tape or by copying the whole system to another disk. In the latter case, if your site maintains its own spare parts inventory, you might consider using one of the spares for this purpose, instead of a disk that is already in use and might itself turn out to have been compromised.

The snapshot is important for several reasons:

If you misdiagnose the problem or blow the recovery, you can always get back to the time of the snapshot.
The snapshot may be vital for investigative and legal proceedings. It lets you get on with the work of recovering the system without fear of destroying evidence.
You can examine the snapshot later, after you're back in operation, to determine what happened and why.

Because the snapshot may become important for legal proceedings, you need to secure the evidence trail. Here are some guidelines:[187]

[187]See Computer Crime: A Crimefighter's Handbook, by David Icove, Karl Seger, and William VonStorch (O'Reilly & Associates, 1995), for a detailed discussion of labeling and protecting evidence.

Uniquely identify (label) the snapshot media and put the date, time, your name, and your signature on it.
Write-protect the media -- permanently, if possible.
Safeguard the media against tampering (for example, put it in a locked container) so that if and when you hand it over to law-enforcement or other authorities, you can tell them whose custody the media has been in and why you're certain it hasn't been tampered with since it was first created.

It's a good idea to set aside an adequate supply of fresh media just for snapshots because you never know when you're going to need to produce one. It's very frustrating to respond to an incident, and be ready to do the snapshot, only to discover that the last blank tape got used for backups the day before and the new order hasn't come in yet.

27.1.7. Restore and Recover

Finally, you're at the point of actually dealing with the incident. What do you do? It depends on the circumstances. Here are some possibilities:

If the attacker didn't succeed in compromising your system, you may not need to do much. You may decide not to bother reacting to casual attempts. You may also find that your incident was actually something perfectly innocent, and you don't need to do anything at all.
If the attack was a particularly determined one, you may want to increase your monitoring (at least temporarily), and you'll probably want to inform other people to watch out for future attempts.
If the attacker became an intruder (that is, he or she actually managed to get into your computers), you're going to need to at least plug the hole the intruder used, and check to make certain he hasn't damaged anything or left anything behind.

At worst, you may need to rebuild your system from scratch. Sometimes you end up doing this because the intruder damaged things, purposefully or accidentally. More often, you'll rebuild your system because it's the only way to ensure you have a clean system that hasn't been booby-trapped. Most intruders start by making sure they'll be able to get back into your system, even if you close their initial entry point. As a result, your systems may be compromised even if the intruder was present for only a short time.

TIP: Always assume that intruders have created back doors into your system so that they can get back in again easily. It's one of the first things many intruders do when they break in to a system.

If you need to rebuild your system, first ensure that your hardware is working properly. You want to make sure it passes all relevant self-tests and diagnostics; you don't want to restore onto a flaky system. A reinstall may reveal previously unnoticed hardware problems. For instance, a disk may have bad spots that are in unused files. When you reinstall the operating system, you will attempt to write over the bad parts, and the problem will suddenly become apparent.

Next, make sure you are using trusted media and programs, not necessarily your last backup, to restore the system. Unless you are absolutely sure that you can accurately date the first time the intruder accessed your system, you don't know whether or not programs had already been modified at the time the backups happened. It's often best to rebuild your system from vendor distribution media (that is, the tapes or CD-ROM your operating system release came on) and then reload only user data (not programs that multiple users share) from your backup tapes.

If you need programs you didn't get from your vendor (for instance, packages from the Internet), then do one of the following:

Rebuild and reinstall these programs from a trusted backup (one you're absolutely positive contains a clean copy).
Obtain and install fresh copies from the site you got the packages from in the first place.

Do not recompile software until you've reinstalled the operating system, including the compiler; you don't know whether the compiler itself, and the libraries it depends on, have been compromised.

This implies that if you're heavily customizing your system or installing a lot of extra software beyond what your vendor gives you, you need to work out a way of archiving those customizations and packages that you're sure can't be tampered with by an attacker. This way, you can easily restore those customizations and packages if you need to. One good way is to make a special backup tape of new software immediately after it's installed and configured, before an attacker has a chance to modify it.

You may have programs that were locally written, and in these cases, you may not be able to find even source code that's guaranteed to be uncontaminated. In this situation, someone -- preferably the original author -- will need to look through the source code. People rarely bother to modify source code, and when they do, they aren't particularly subtle most of the time. That's because they don't need to be; almost nobody actually bothers to look at the source before recompiling it.

In one case, a programmer installed a back door into code he expected would run on only one machine, as a personal convenience. The program turned out to be fairly popular and was adopted in a number of different sites within his university. Years after he wrote it, and long after the original machine was running a version without the back door, he discovered that the back door was still present on all the other sites, despite the fact that it was clearly marked and commented and within the first page of code. You can't make a comprehensive search of a large program, but you can at least avoid humiliation by looking for obvious changes.

27.1.8. Document the Incident

Life gets very confusing when you're discovering, investigating, and recovering from a security incident. A good chain of communication is important in keeping people informed and preventing them from tripping over each other. Keeping a written (either hardcopy or electronic) record of your activities during the incident is also important. Such a record serves several purposes:

It can help keep people informed (and thereby help them to resolve the incident more quickly).
It tells you what you did and when, in responding, so that you can analyze your response later on (and maybe do better next time).
It will be vital if you intend to pursue any legal action.

From a legal standpoint, the best records are hardcopy records generated and identified at the time of occurrence. Just about anything else (particularly anything kept online) could be tampered with or falsified fairly easily -- or at least a judge and jury could be convinced of that. You need to produce records on pieces of paper, label, date, and sign them. Furthermore, unless the pages are actually bound together, so that pages can't be inserted or removed without indication, you'll need to date and sign every page. (And you thought continuous tractor-feed paper was useless these days!)

You need to have legal documentation even if you aren't completely certain you're going to need it. An incident that initially looks fairly simple may turn out to be serious. Don't assume it isn't going to be worth bringing in the police.

For both legal and practical reasons, it's useful to put in exact times when things occurred. Legally, this helps to show that entries were being made in order. Practically, it's extremely helpful when you need to correlate multiple sources of information (for instance, when you need to compare your logs against event logs on computers or against somebody else's actions).

Here are several useful documentation methods you might want to consider:

Notebooks -- carbon copy lab notebooks are especially useful because you can write a note, tear it out and give it to someone and still have a copy of the note. Another benefit is that the pages are usually numbered, so you can determine later on whether any pages have been removed or added.
Terminals running with attached printers or old-fashioned printing terminals.
A shell running under the Unix script command, with the resulting typescript immediately printed and identified.
A personal computer terminal program running in "capture" mode, with the resulting typescript immediately printed and identified.
A microcassette recorder for verbal notes.

You will probably want to use multiple methods, one to record what's happening online and one to record what's happening outside of the computer. For example, you might have a typescript of the commands you were typing, but a handwritten log for phone calls.

It's easy to decide what to record online; you simply record everything you do. Remember to use the terminal or session that's being recorded. (With some methods, like script, you can record every session you've got going; just make sure you record each session in a separate file.) It's harder to decide what to record of the events that don't just get automatically captured. You certainly want to record at least this much:

Who you called, when, and why.
A summary of what you told them.
A summary of what they told you. (That summary may end up being "see above" some of the time, but you still want to be able to figure out who you were talking to, and when and why.)
Meetings and important decisions and actions that aren't captured online (e.g., the time at which you disconnected the network).

In addition to the journal, a log of time spent for everyone working on an incident can be invaluable. You may need to justify some level of "loss" in order for some law enforcement agencies to be able to open an investigation, and if the intruder didn't do any damage to the machines, the time that was spent cleaning up is the main loss.

Time logs may also be useful if you are having difficulty in convincing management that the organization needs to allocate additional resources to be prepared to deal with incidents. It's a way of showing how much these incidents cost. It's particularly helpful if you can show which areas could have been anticipated and mitigated by planning.