10.2. Receiving TrapsLet's start by discussing how to deal with incoming traps. Handling incoming traps is the responsibility of the NMS. Some NMSs do as little as display the incoming traps to standard output (stdout). However, an NMS server typically has the ability to react to SNMP traps it receives. For example, when an NMS receives a linkDown trap from a router, it might respond to the event by paging the contact person, displaying a pop-up message on a management console, or forwarding the event to another NMS. This procedure is streamlined in commercial packages but still can be achieved with freely available open source programs.10.2.1. HP OpenViewOpenView uses three pieces of software to receive and interpret traps: OpenView's main trap-handling daemon is called ovtrapd. This program listens for traps generated by devices on the network and hands them off to the Postmaster daemon (pmd ). In turn, pmd triggers what OpenView calls an event. Events can be configured to perform actions ranging from sending a pop-up window to NNM users, forwarding the event to other NMSs, or doing nothing at all. The configuration process uses xnmtrap, the Event Configurations GUI. The xnmevents program displays the events that have arrived, sorting them into user-configurable categories. OpenView keeps a history of all the traps it has received; to retrieve that history, use the command $OV_BIN/ovdumpevents. Older versions of OpenView kept an event logging file in $OV_LOG/trapd.log. By default, this file rolls over after it grows to 4 MB. It is then renamed trapd.log.old and a new trapd.log file is started. If you are having problems with traps, either because you don't know whether they are reaching the NMS or because your NMS is being bombarded by too many events, you can use tail -f to watch trapd.log so you can see the traps as they arrive. (You can also use ovdumpevents to create a new file.) To learn more about the format of this file, refer to OpenView's manual pages for trapd.conf (4) and ovdumpevents (1M). It might be helpful to define what exactly an OpenView event is. Think of it as a small record, similar to a database record. This record defines which trap OpenView should watch out for. It further defines what sort of action (send an email, page someone, etc.), if any, should be performed.10.2.2. Using NNM's Event ConfigurationsOpenView uses an internal definition file to determine how to react to particular situations. This definition file is maintained by the xnmtrap program. We can start xnmtrap by using the menu item "Options Event Configurations" (on the NNM GUI) or by giving the command $OV_BIN/xnmtrap. In the Enterprise Identification window, scroll down and click on the enterprise name "OpenView .1.3.6.1.4.1.11.2.17.1." This displays a list in the Event Identification window. Scroll down in this list until you reach "OV_Node_Down." Double-click on this event to bring up the Event Configurator (Figure 10-1).Figure 10-1. OpenView Event Configurator -- OV_Node_DownFigure 10-1 shows the OV_Node_Down event in the Event Configurator. When this event gets triggered, it inserts an entry containing the message "Node down," with a severity level of "Warning," into the Status Events category. OpenView likes to have a leading 0 (zero) in the Event Object Identifier, which indicates that this is an event or trap -- there is no way to change this value yourself. The number before the 0 is the enterprise OID; the number after the 0 is the specific trap number, in this case 58916865.[42] Later we will use these numbers as parameters when generating our own traps.[42]This is the default number that OpenView uses for this OV_Node_Down trap. 10.2.2.1. Selecting event sourcesThe Source option is useful when you want to receive traps from certain nodes and ignore traps from other nodes. For example, if you have a development router that people are taking up and down all day, you probably would rather not receive all the events generated by the router's activity. In this case, you could use the Source field to list all the nodes from which you would like to receive traps, and leave out the development router. To do this, you can either type each of the hostnames by hand and click "Add" after each one, or select each node (using the Ctrl and mouse-click sequence) on your OpenView Network Node Map and click "Add From Map." Unfortunately, the resulting list isn't easy to manage. Even if you take the time to add all the current routers to the Event Sources, you'll eventually add a new router (or some other hardware you want to manage). You then have to go back to all your events and add your new devices as sources. Newer versions of OpenView allow you to use pattern matching and source files, making it easier to tailor and maintain the source list.10.2.2.2. Setting event categoriesWhen NNM receives an event, it sorts the event into an event category. The Categories drop-down box lets you assign the event you're configuring to a category. The list of available categories will probably include the following predefined categories (you can customize this list by adding categories specific to your network and deleting categories, as we'll see later in this section):
[43]Again, in earlier releases of OpenView this log was located in $OV_LOG/trapd.log. New versions use the OpenView Event Database. This is backward-compatible using the ovdumpevents command to produce a " trapd.log file. TIP: Log only" is useful if you have some events that are primarily informational; you don't want to see them when they arrive, but you would like to record them for future reference. The Cisco event frDLCIStatusChange - .1.3.6.1.2.1.10.32.0.1 is a good example of such an event. It tells us when a Virtual Circuit has changed its operational state. If displayed, we will see notifications whenever a node goes down and whenever a circuit changes its operational state to down. This information is redundant because we have already gotten a status event of "node down" and a DLCI change.[44] With this event set to "Log only" we can go to the log file only when we think things are fishy.[44]Newer versions of OpenView have a feature called Event Correlation that groups certain events together to avoid flooding the user with redundant information. You can customize these settings with a developer's kit. 10.2.2.3. Forwarding events and event severitiesThe "Forward Event" radio button, once checked, allows you to forward an event to other NMSs. This feature is useful if you have multiple NMSs or a distributed network-management architecture. Say that you are based in Atlanta, but your network has a management station in New York in addition to the one on your desk. You don't want to receive all of New York's events, but you would like the node_down information forwarded to you. On New York's NMS, you could click "Forward Event" and insert the IP address of your NMS in Atlanta. When New York receives a node_down event, it will forward the event to Atlanta. The Severity drop-down list assigns a severity level to the event. OpenView supports six severity levels: Unknown, Normal, Warning, Minor, Major, and Critical. The severity levels are color-coded to make identification easier; Table 10-1 shows the color associated with each severity level. The levels are listed in order of increasing severity. For example, an event with a severity level of Minor has a higher precedence than an event with a severity of Warning.Table 10-1. OpenView Severity Levels
The colors are used both on OpenView's maps and in the Event Categories. Parent objects, which represent the starting point for a network, are displayed in the color of the highest severity level associated with any object underneath them.[45] For example, if an object represents a network with 250 nodes and one of those nodes is down (a Critical severity), the object will be colored red, regardless of how many nodes are up and functioning normally. The term for how OpenView displays colors in relation to objects is status source ; it is explained in more detail in Chapter 6, "Configuring Your NMS". [45]Parent objects can show status (colors) in four ways: Symbol, Object, Compound, or Propagated. 10.2.2.4. Log messages, notifications, and automatic actionsReturning to Figure 10-1, the Event Log Message and Popup Notification fields are similar, but serve different purposes. The Event Log Message is displayed when you view the Event Categories and select a category from the drop-down list. The Popup Notification, which is optional, displays its message in a window that appears on any server running OpenView's NNM. Figure 10-2 shows a typical pop-up message. The event name, delme in this case, appears in the title bar. The time and date at which the event occurred are followed by the event message, "Popup Message Here." To create a pop-up message like this, insert "Popup Message Here" in the Popup Notification section of the Event Configurator. Every time the event is called, a pop-up will appear.Figure 10-2. OpenView pop-up messageThe last section of the Event Configurator is the Command for Automatic Action. The automatic action allows you to specify a Unix command or script to execute when OpenView receives an event. You can run multiple commands by separating them with a semicolon, much as you would in a Unix shell. When configuring an automatic action, remember that rsh can be very useful. I like to use rsh sunserver1 audioplay -v50 /opt/local/sounds/siren.au, which causes a siren audio file to play. The automatic action can range from touching a file to opening a trouble ticket. In each Event Log Message, Popup Notification, and Command for Automatic Action, special variables can help you identify the values from your traps or events. These variables provide the user with additional information about the event. Here are some of the variables you can use; the online help has a complete list:
WARNING: Before you start running scripts for an event, find out the average number of traps you are likely to receive for that event. This is especially true for OV_Node_Down. If you write a script that opens a trouble ticket whenever a node goes down, you could end up with hundreds of tickets by the end of the day. Monitoring your network will make you painfully aware of how much your network "flaps," or goes up and down. Even if the network goes down for a second, for whatever reason, you'll get a trap, which will in turn generate an event, which might register a new ticket, send you a page, etc. The last thing you want is "The Network That Cried Down!" You and other people on your staff will start ignoring all the false warnings and may miss any serious problems that arise. One way to estimate how frequently you will receive events is to log events in a file ("Log only"). After a week or so, inspect the log file to see how many events accumulated (i.e., the number of traps received). This is by no means scientific, but it will give you an idea of what you can expect. 10.2.3. Custom Event CategoriesOpenView uses the default categories for all its default events. Look through the $OV_CONF/C/trapd.conf file to see how the default events are assigned to categories. You can add categories by going to "Event Configuration Edit Configure Event Categories." Figure 10-3 shows this menu, with some custom categories added.Figure 10-3. Adding event categories in OpenViewIt's worth your while to spend time thinking about what categories are appropriate for your environment. If you plow everything into the default categories you will be bothered by the Critical "Printer Needs Paper" event, when you really want to be notified of the Critical "Production Server on Fire" event. Either event will turn Status Events red. The categories in Figure 10-3 are a good start, but think about the types of events and activities that will be useful in your network. The Scheduled and Unscheduled (S/U) Downtime category is a great example of a category that is more for human intervention than for reporting network errors. Printer Events is a nice destination for your "Printer Needs Paper" and "Printer Jammed" messages. Even though none of the default categories are required (except for Error), we recommend that you don't delete them, precisely because they are used for all of the default events. Deleting the default categories without first reconfiguring all the default events will cause problems. Any event that does not have an event category available will be put into the default Error category. To edit the categories, copy the trapd.conf file into /tmp and modify /tmp/trapd.conf with your favorite editor. The file has some large warnings telling you never to edit it by hand, but sometimes a few simple edits are the best way to reassign events. An entry in the portion of the file that defines event behavior looks like this:It's fairly obvious what these lines do: they map a particular RMON event into the Threshold Events category with a severity of Warning; they also specify what should happen when the event occurs. To map this event into another category, change Threshold Events to the appropriate category. Once you've edited the file, use the following command to merge in your updates:EVENT RMON_Rise_Alarm .1.3.6.1.2.1.16.0.1 "Threshold Events" Warning FORMAT RMON Rising Alarm: $2 exceeded threshold $5; value = $4. (Sample type = \ $3; alarm index = $1) SDESC This event is sent when an RMON device exceeds a preconfigured threshold. EDESC $ $OV_BIN/xnmevents -l load /tmp/trapd.conf 10.2.4. The Event Categories DisplayThe Event Categories window (Figure 10-4) is displayed on the user's screen when NNM is started. It provides a very brief summary of what's happening on your network; if it is set up appropriately, you can tell at a glance whether there are any problems you should be worrying about.Figure 10-4. OpenView Event CategoriesIf the window gets closed during an OpenView session, you can restart it using the "Fault Events" menu item or by issuing the command $OV_BIN/xnmevents. The menu displays all the event categories, including any categories you have created. Two categories are special: the Error category is the default category used when an event is associated with a category that cannot be found; the All category is a placeholder for all events and cannot be configured by the Event Configurator. The window shows you the highest severity level of any event in each event category. The box to the left of Status Events is cyan (a light blue), showing that the highest unacknowledged severity in the Status Events category is Warning. Clicking on that box displays an alarm browser that lists all the events received in the category. A nice feature of the Event Categories display is the ability to restore a browser's state or reload events from the trapd.log and trapd.log.old files. Reloading events is useful if you find that you need to restore messages you deleted in the past.TIP: Newer versions of OpenView extend the abilities of Event Categories by keeping a common database of acknowledged and unacknowledged events. Thus, when a user acknowledges an event, all other users see this event updated.At the bottom of Figure 10-4, the phrase "[Read-Only]" means that you don't have write access to Event Categories. If this phrase isn't present, you have write access. OpenView keeps track of events on a per-user basis, using a special database located in $OV_LOG/xnmevents.username.[46] With write access, you have the ability to update this file whenever you exit. By default, you have write access to your own event category database, unless someone has already started the database by starting a session with your username. There may be only one write-access Event Categories per user, with the first one getting write access and all others getting read-only privileges. [46]Again, newer versions of OpenView have only one database that is common for all users. 10.2.5. The Alarm BrowserFigure 10-5 shows the alarm browser for the Status Events category. In it we see a single Warning event, which is causing the Status Events category to show cyan.Figure 10-5. OpenView alarm browserThe color of the Status Events box is determined by the highest-precedence event in the category. Therefore, the color won't change until either you acknowledge the highest-precedence event or an event arrives with an even higher precedence. Clicking in the far left column (Ack) acknowledges the message[47] and sets the severity to 0.[47]Newer versions of OpenView support Event Correlation, which has a column in this window as well.The Actions menu in the alarm browser allows you to acknowledge, deacknowledge, or delete some or all events. You can even change the severity of an event. Keep in mind that this does not change the severity of the event on other Event Categories sessions that are running. For example, if one user changes the severity of an event from Critical to Normal, the event will remain Critical for other users. The View menu lets you define filters, which allow you to include or discard messages that match the filter. When configuring events, keep in mind that you may receive more traps than you want. When this happens, you have two choices. First, you can go to the agent and turn off trap generation, if the agent supports this. Second, you can configure your trap view to ignore these traps. We saw how to do this earlier: you can set the event to "Log only" or try excluding the device from the Event Sources list. If bandwidth is a concern, you should investigate why the agent is sending out so many traps before trying to mask the problem. 10.2.6. Creating Events Within OpenViewOpenView gives you the option of creating additional (private) events. Private events are just like regular events, except that they belong to your private-enterprise subtree, rather than to a public MIB. To create your own events, launch the Event Configuration window from the Options menu of NNM. You will see a list of all currently loaded events (Figure 10-6).Figure 10-6. OpenView's Event ConfigurationThe window is divided into two panes. The top pane displays the Enterprise Identification, which is the leftmost part of an OID. Clicking on an enterprise ID displays all the events belonging to that enterprise in the lower pane. To add your own enterprise ID, select "Edit Add Enterprise Identification" and insert your enterprise name and a registered enterprise ID.[48] Now you're ready to create private events. Click on the enterprise name you just created; the enterprise ID you've associated with this name will be used to form the OID for the new event. Click "Edit Add Event"; then type the Event Name for your new event, making sure to use Enterprise Specific (the default) for the event type. Insert an Event Object Identifier. This identifier can be any number that hasn't already been assigned to an event in the currently selected enterprise. Finally, click "OK" and save the event configuration (using "File Save").[48]Refer to Chapter 2, "A Closer Look at SNMP" for information about obtaining your own enterprise ID.To copy an existing event, click on the event you wish to copy and select "Edit Copy Event"; you'll see a new window with the event you selected. From this point on, the process is the same. Traps with "no format" are traps for which nothing has been defined in the Event Configuration window. There are two ways to solve this problem: you can either create the necessary events on your own or you can load a MIB that contains the necessary trap definitions, as discussed in Chapter 6, "Configuring Your NMS". "No format" traps are frequently traps defined in a vendor-specific MIB that hasn't been loaded. Loading the appropriate MIB often fixes the problem by defining the vendor's traps and their associated names, IDs, comments, severity levels, etc. TIP: Before loading a MIB, review the types of traps the MIB supports. You will find that most traps you load come, by default, in LOGONLY mode. This means that you will not be notified when the traps come in. After you load the MIB you may want to edit the events it defines, specifying the local configuration that best fits your site. 10.2.7. Monitoring Traps with PerlIf you can't afford an expensive package like OpenView, you can use the Perl language to write your own monitoring and logging utility. You get what you pay for, since you will have to write almost everything from scratch. But you'll learn a lot and probably get a better appreciation for the finer points of network management. One of the most elementary, but effective, programs to receive traps is in a distribution of SNMP Support for Perl 5, written by Simon Leinen. Here's a modified version of Simon's program:This program displays traps as they are received from different devices in the network. Here's some output, showing two traps:#!/usr/local/bin/perl use SNMP_Session "0.60"; use BER; use Socket; $session = SNMPv1_Session->open_trap_session ( ); while (($trap, $sender, $sender_port) = $session->receive_trap ( )) { chomp ($DATE=`/bin/date \'+%a %b %e %T\'`); print STDERR "$DATE - " . inet_ntoa($sender) . " - port: $sender_port\n"; print_trap ($session, $trap); } 1; sub print_trap ($$) { ($this, $trap) = @_; ($community, $ent, $agent, $gen, $spec, $dt, @bindings) = \ $this->decode_trap_request ($trap); print " Community:\t".$community."\n"; print " Enterprise:\t".BER::pretty_oid ($ent)."\n"; print " Agent addr:\t".inet_ntoa ($agent)."\n"; print " Generic ID:\t$gen\n"; print " Specific ID:\t$spec\n"; print " Uptime:\t".BER::pretty_uptime_value ($dt)."\n"; $prefix = " bindings:\t"; foreach $encoded_pair (@bindings) { ($oid, $value) = decode_by_template ($encoded_pair, "%{%O%@"); #next unless defined $oid; print $prefix.BER::pretty_oid ($oid)." => ".pretty_print ($value)."\n"; $prefix = " "; } } The output format is the same for both traps. The first line shows the date and time at which the trap occurred, together with the IP address of the device that sent the trap. Most of the remaining output items should be familiar to you. The bindings output item lists the variable bindings that were sent in the trap PDU. In the example above, each trap contained one variable binding. The object ID is in numeric form, which isn't particularly friendly. If a trap has more than one variable binding, this program displays each binding, one after another. An ad hoc monitoring system can be fashioned by using this Perl script to collect traps and some other program to inspect the traps as they are received. Once the traps are parsed, the possibilities are endless. You can write user-defined rules that watch for significant traps and, when triggered, send an email alert, update an event database, send a message to a pager, etc. These kinds of solutions work well if you're in a business with little or no budget for commercially available NMS software or if you're on a small network and don't need a heavyweight management tool.Mon Apr 28 22:07:44 - 10.123.46.26 - port: 63968 community: public enterprise: 1.3.6.1.4.1.2789.2500 agent addr: 10.123.46.26 generic ID: 6 specific ID: 5247 uptime: 0:00:00 bindings: 1.3.6.1.4.1.2789.2500.1234 => 14264026886 Mon Apr 28 22:09:46 - 172.16.51.25 - port: 63970 community: public enterprise: 1.3.6.1.4.1.2789.2500 agent addr: 172.16.253.2 generic ID: 6 specific ID: 5247 uptime: 0:00:00 bindings: 1.3.6.1.4.1.2789.2500.2468 => Hot Swap Now In Sync 10.2.8. Using the Network Computing Technologies Trap ReceiverThe Trap Receiver by Network Computing Technologies is a freely available program that's worth trying.[49] This program, which currently runs only on Windows-based systems, displays trap information as it's received. It has a standard interface but can be configured to execute certain actions against traps, like OpenView's Command for Automatic Action function. Figure 10-7 shows Trap Receiver's user interface.[49]This software can be found on their web page at http://www.ncomtech.com. Figure 10-7. Trap ReceiverThere are ways to log and forward messages and traps, send email or a page in response to a trap, as well as execute commands. By writing some code in C or C++, you can gain access to an internal trap stream. This program can be a great starting place for Windows administrators who want to use SNMP but lack the resources to implement something like OpenView. It's simple to use, extensible, and free.10.2.9. Receiving Traps Using Net-SNMPThe last trap receiver we'll discuss is part of the Net-SNMP package, which is also freely available. snmptrapd allows you to send SNMP trap messages to facilities such as Unix syslog or stdout. For most applications the program works in the background, shipping messages to syslog(8). There are some configuration parameters for the syslog side of snmptrapd; these tell snmptrapd what facility level it should use for the syslog messages. The following command forwards traps to standard output (-P) rather than to syslog as they are received:By now the output should look familiar; it's similar to the reports generated by the other programs we've seen in this chapter. The Net-SNMP trap daemon is another great tool for scriptwriters. A simple Perl script can watch the file in which snmptrapd logs its traps, looking for important events and reacting accordingly. It's easy to build a powerful and flexible monitoring system at little or no expense. We have seen several packages that can receive traps and act on them, based on the traps' content. Keep in mind that all of these programs, whether they're free or cost tens of thousands of dollars, are basically doing the same thing: listening on some port (usually UDP port 162) and waiting for SNMP messages to arrive. What sets the various packages apart is their ability to do something constructive with the traps. Some let you program hooks that execute some other program when a certain trap is received. The simpler trap monitors just send a message logging the trap to one or more files or facilities. These packages are generally less expensive than the commercial trap monitors, but can be made to operate like full-fledged systems with some additional programming effort. Programs such as Perl give you the ability to extend these simpler packages.$ ./snmptrapd -P 2000-12-13 19:10:55 UCD-SNMP Version 4.1.2 Started. 2000-12-13 19:11:14 sunserver2.ora.com [12.1.45.26] enterprises.2789.2500: Enterprise Specific Trap (1224) Uptime: 5 days, 10:01:20.42 enterprises.2789.2500.1224 = 123123 2000-12-13 19:11:53 sunserver2.ora.com [12.1.45.26] enterprises.2789.2500: Enterprise Specific Trap (1445) Uptime: 5 days, 10:01:21.20 enterprises.2789.2500.1445 = "Fail Over Complete" Copyright © 2002 O'Reilly & Associates. All rights reserved. |
||||||||||||||
|