Chapter 27. Responding to Security Incidents
The CERT Coordination Center (CERT-CC) reports that, despite increased awareness, the first time many organizations start thinking about how to handle a computer security incident is after an intrusion has occurred. Obviously, this isn't a great approach. You need a plan for how you're going to respond to a computer security incident at your site, and you need to develop that plan well before an incident occurs.
Contents:Responding to an Incident
What to Do After an Incident
Pursuing and Capturing the Intruder
Planning Your Response
There isn't room here to detail everything you need to know to deal with a security incident: attacks are many and varied and change constantly; responding to them can involve a byzantine assortment of legal and technical issues. This chapter is intended to give you an outline of the issues involved and the practical steps you can take ahead of time to smooth the process. Appendix A, "Resources", provides a list of resources that may provide additional help.
27.1. Responding to an IncidentThis section discusses a number of steps you'll need to take when you respond to a security incident. You won't necessarily need to follow these steps in the order they're given, and not all of these steps are appropriate for all incidents. But, we recommend that you at least contemplate each of them when you find yourself dealing with an incident.
In Section 27.4, "Planning Your Response", later in this chapter, we'll look again at each of these steps and help you figure out how to work them into the overall response plan that you should develop before an incident actually occurs.
27.1.1. Evaluate the SituationThe first step in responding to a security incident is to decide what response, if any, needs to be made immediately. Ask these questions:
On the other hand, if the incident is a less aggressive one -- perhaps someone has just opened a Telnet connection to your machine and is trying various login/password pairs -- then you may want to move more slowly. If you're reasonably confident that the attack won't succeed (e.g., you can see that the attacker is trying passwords that consist of all lowercase letters, and you know for certain that no account on the system has such a password), you might want to leave things alone and just watch for a while to see what the attacker does. This may give you an opportunity to trace the attack. (However, see the Section 27.3, "Pursuing and Capturing the Intruder" section, later in this chapter, for a discussion of the issues involved in tracing an attack.)
Whatever you do, remember Rule 1: Don't panic!
27.1.2. Start DocumentingAs soon as you determine that you actually have a problem that you need to respond to, start documenting what's going on. You don't need to get fancy at this point (you don't have time to, until you've taken the next step), but you should at least start a log by making a note of what time it is.
27.1.3. Disconnect or Shut Down, as AppropriateOnce you've evaluated the situation, your next priority is to give yourself the time to respond without risking your systems further. The least disruptive alternative is usually to disconnect the affected machine from all networks; this will shut down any active connections. Shutting down active connections may make it harder to trace the intruder, but it will allow the rest of the people at your site to continue to do their work, and it will leave the intruder's programs running. This may help you to identify who the intruder might be.
If you're afraid that other machines have been compromised or are vulnerable to the same attack, you'll probably want to disconnect as many machines as you can as a unit. This may mean taking down your connection to the Internet, if possible. If your Internet connection is managed elsewhere in your organization, you may need to detach just your portion of the network, but you'll also need to talk to other parts of your organization as soon as possible to let them know what's happening.
In some situations, you may want to shut down the compromised system. However, this action should be a last resort for a number of reasons:
27.1.4. Analyze and RespondYour next priority is to start to fix what's gone wrong. The first step in actually correcting the problem is to relax, think for a while, and make sure you really understand what's happening and what you're dealing with. The last thing you want to do is make the situation worse by doing something rash and ill considered. Whatever corrective actions you're contemplating, think them through carefully. Will they really solve the problem? Will they, in turn, cause other problems?
When you're working in an unusual, high-stress situation like this, the chances increase of making a major error. Because you're probably going to be working with system privileges (for example, working as root on a Unix system), the consequences of an error could be serious.
There are several ways you can reduce the chances of making an error. One good way is to work with a partner; each of you can check the other's commands after they're typed but before they're executed. Even if you're working alone, many people find that reading commands aloud and checking the arguments in reverse order before executing them helps avoid mistakes. Resist the temptation to try to work fast. You will go home sooner if you work slowly and carefully.
Try not to let your users get in the way of your response. You may want to give someone the specific job of dealing with user inquiries so the rest of your response team can concentrate on responding to the incident.
Also, try to keep your responders from tripping over each other. Make it clear which system managers and investigators are working on which task, so they won't step on each other's toes (or wind up unintentionally chasing each other as part of the investigation!).
27.1.5. Make "Incident in Progress" NotificationsYou're not the only person who needs to know what's going on. A number of other people -- in a number of different places -- have to be kept informed.
184.108.40.206. Your own organizationWithin your own organization are people who need to know that something is happening: management, users, and staff. At the very least, let them know that you are busy responding to an incident and that you may not be available to them for other matters. They usually need to know why they're being inconvenienced and what they should do to speed recovery (even if the only thing they can do is to go away and leave you alone).
It is particularly important that management and other staff know what's going on. Otherwise, you risk having them act in opposition to you. For instance, if you've disconnected the Internet connection, the chances are high that somebody's going to notice the service outage and try to fix it. That's a problem if it's another staff member, but it can be a disaster if it turns into a management requirement.
If people call management to complain about some side effect of your response, and the manager they get has been briefed about what's going on, the chances are that the manager will defend your need to make a response. At worst, the manager will make a reasoned decision about the importance of incident response versus other needs of the company. However, if the manager doesn't know what's going, he or she will probably respond the same way the manager would to any other network outage: "Gee, that's terrible, we'll fix it as soon as possible." The manager has then promised the user something, and the chances are very small that the manager will go back on that promise. Instead, your response will be curtailed by the need to restore service as soon as possible.
Depending on the nature of your site and the incident in question, you may also need to inform your legal, audit, public relations, and security departments. You will always want to contact the security department if:
220.127.116.11. CERT-CC or other incident response teamsIf your organization is served by an incident response team such as CERT-CC, or has its own such team, let them know what's going on and try to enlist their aid. (For instructions on how to contact CERT-CC or another response team, see Appendix A, "Resources".) What steps response teams can take to help you will depend on the charter and resources of the response team. Even if they can't help you directly, they can tell you whether the attack on your site looks as if it is part of a larger pattern of incidents. In that case, they may be able to coordinate your response with the responses of other sites.
18.104.22.168. Vendors and service providersYou might want to get in touch with your vendor support contacts or your Internet service provider(s) if you think they might be able to help or should be aware of the situation. For example, if the attackers appear to be exploiting an operating system bug, you should probably contact the vendor to see if they know about it and have a fix for it. At the very least, they'll be able to warn other sites about the bug. Similarly, your Internet provider is unlikely to be able to do much about your immediate problem, but they may be able to warn other customers. There is also a possibility that your Internet provider has itself been compromised, in which case, they need to know immediately. Your vendors and service provider may have special contacts or procedures for security incidents that will yield much faster results than going through normal support channels.
You may get little or no visible response when you make these reports. This might be because you're being ignored or because companies are putting self-defense before the interests of their customers. On the other hand, it's often due to sensible precautions that are intended to make certain that problems are not publicized before fixes are available (jeopardizing places not yet under attack), that the fixes that are made are appropriate to the problem, and that attackers don't get valuable information by pretending to be sites under attack. You might as well give your suppliers the benefit of the doubt, since it's almost impossible to tell which of these is going on.
22.214.171.124. Other sitesFinally, if the incident appears to involve other sites -- that is, if the attack appears to be coming from a particular site, or if it looks as if the attackers have gone after that site after breaking into yours -- you should inform those other sites. These sites are usually easy to identify as the sources or destinations of connections. It's often much harder to figure out how to find an actual human being with some responsibility for the computer in question, who is awake and reachable and has a common language with you.
Once again, you may get little or no apparent response for any number of different reasons, some of them annoying and reprehensible, and some of them perfectly sensible. The other site may not care whether their users are attacking you, or they may care desperately but have no way of telling you about it without revealing information to the attackers. While it's always nice to get somebody who makes an immediate, visibly effective response and thanks you promptly for the information, don't expect it and don't be upset when you don't get it.
If you don't know who to inform, talk to your response team (or CERT-CC). They will probably either know or know how to find out, and they have experience in calling strangers to tell them they have security problems.
27.1.6. Snapshot the SystemAnother early step to take is to make a "snapshot" of each compromised system. You might do so by doing a full backup to tape or by copying the whole system to another disk. In the latter case, if your site maintains its own spare parts inventory, you might consider using one of the spares for this purpose, instead of a disk that is already in use and might itself turn out to have been compromised.
The snapshot is important for several reasons:
See Computer Crime: A Crimefighter's Handbook, by David Icove, Karl Seger, and William VonStorch (O'Reilly & Associates, 1995), for a detailed discussion of labeling and protecting evidence.
27.1.7. Restore and RecoverFinally, you're at the point of actually dealing with the incident. What do you do? It depends on the circumstances. Here are some possibilities:
TIP: Always assume that intruders have created back doors into your system so that they can get back in again easily. It's one of the first things many intruders do when they break in to a system.If you need to rebuild your system, first ensure that your hardware is working properly. You want to make sure it passes all relevant self-tests and diagnostics; you don't want to restore onto a flaky system. A reinstall may reveal previously unnoticed hardware problems. For instance, a disk may have bad spots that are in unused files. When you reinstall the operating system, you will attempt to write over the bad parts, and the problem will suddenly become apparent.
Next, make sure you are using trusted media and programs, not necessarily your last backup, to restore the system. Unless you are absolutely sure that you can accurately date the first time the intruder accessed your system, you don't know whether or not programs had already been modified at the time the backups happened. It's often best to rebuild your system from vendor distribution media (that is, the tapes or CD-ROM your operating system release came on) and then reload only user data (not programs that multiple users share) from your backup tapes.
If you need programs you didn't get from your vendor (for instance, packages from the Internet), then do one of the following:
This implies that if you're heavily customizing your system or installing a lot of extra software beyond what your vendor gives you, you need to work out a way of archiving those customizations and packages that you're sure can't be tampered with by an attacker. This way, you can easily restore those customizations and packages if you need to. One good way is to make a special backup tape of new software immediately after it's installed and configured, before an attacker has a chance to modify it.
You may have programs that were locally written, and in these cases, you may not be able to find even source code that's guaranteed to be uncontaminated. In this situation, someone -- preferably the original author -- will need to look through the source code. People rarely bother to modify source code, and when they do, they aren't particularly subtle most of the time. That's because they don't need to be; almost nobody actually bothers to look at the source before recompiling it.
In one case, a programmer installed a back door into code he expected would run on only one machine, as a personal convenience. The program turned out to be fairly popular and was adopted in a number of different sites within his university. Years after he wrote it, and long after the original machine was running a version without the back door, he discovered that the back door was still present on all the other sites, despite the fact that it was clearly marked and commented and within the first page of code. You can't make a comprehensive search of a large program, but you can at least avoid humiliation by looking for obvious changes.
27.1.8. Document the IncidentLife gets very confusing when you're discovering, investigating, and recovering from a security incident. A good chain of communication is important in keeping people informed and preventing them from tripping over each other. Keeping a written (either hardcopy or electronic) record of your activities during the incident is also important. Such a record serves several purposes:
You need to have legal documentation even if you aren't completely certain you're going to need it. An incident that initially looks fairly simple may turn out to be serious. Don't assume it isn't going to be worth bringing in the police.
For both legal and practical reasons, it's useful to put in exact times when things occurred. Legally, this helps to show that entries were being made in order. Practically, it's extremely helpful when you need to correlate multiple sources of information (for instance, when you need to compare your logs against event logs on computers or against somebody else's actions).
Here are several useful documentation methods you might want to consider:
It's easy to decide what to record online; you simply record everything you do. Remember to use the terminal or session that's being recorded. (With some methods, like script, you can record every session you've got going; just make sure you record each session in a separate file.) It's harder to decide what to record of the events that don't just get automatically captured. You certainly want to record at least this much:
Time logs may also be useful if you are having difficulty in convincing management that the organization needs to allocate additional resources to be prepared to deal with incidents. It's a way of showing how much these incidents cost. It's particularly helpful if you can show which areas could have been anticipated and mitigated by planning.
Copyright © 2002 O'Reilly & Associates. All rights reserved.