External Polling (Essential SNMP)

Add MIB Object." This takes you to a new screen. At the top, click on "MIB Object"[40] and click down through the tree until you find the object you would like to poll. To look at the status of our printer's paper tray, for example, we need to navigate down to .iso.org.dod.internet.private.enterprises.hp.nm.system.net-peripheral.net-printer.generalDeviceStatus.gdStatusEntry.gdStatusPaperOut (.1.3.6.1.4.1.11.2.3.9.1.1.2.8).[41] The object's description suggests that this is the item we want: it reads "This indicates that the peripheral is out of paper." (If you already know what you're looking for, you can enter the name or OID directly.) Once there, you can change the name of the collection to something that is easier to read. Click "OK" to move forward. This brings you to the menu shown in Figure 9-8.

[40]You can collect the value of an expression instead of a single MIB object. The topic of expressions is out of the scope of this book but is explained in the mibExpr.conf (4) manpage.

[41]This object is in HP's private MIB, so it won't be available unless you have HP printers and have installed the appropriate MIBs. Note that there is a standard printer MIB, RFC 1759, but HP's MIB has more useful information.

Figure 9-8. OpenView poll configuration menu

The "Source" field is where you specify the nodes from which you would like to collect data. Enter the hostnames or IP addresses you want to poll. You can use wildcards like 198.27.6.* in your IP addresses; you can also click "Add Map" to add any nodes currently selected. We suggest that you start with one node for testing purposes. Adding more nodes to a collection is easy once you have everything set up correctly; you just return to the window in Figure 9-8 and add the nodes to the Source list.

"Collection Mode" lets you specify what to do with the data NNM collects. There are four collection modes: "Exclude Collection," "Store, Check Thresholds," "Store, No Thresholds," and "Don't Store, Check Thresholds." Except for "Exclude Collection," which allows us to turn off individual collections for each device, the collection modes are fairly self-explanatory. ("Exclude Collection" may sound odd, but it is very useful if you want to exclude some devices from collection without stopping the entire process; for example, you may have a router with a hardware problem that is bombarding you with meaningless data.) Data collection without a threshold is easier than collection with a threshold, so we'll start there. Set the Collection Mode to "Store, No Thresholds." This disable (grays out) the bottom part of the menu, which is used for threshold parameters. (Select "Store, Check Thresholds" if you want both data collection and threshold monitoring.) Then click "OK" and save the new collection. You can now watch your collection grow in the $OV_DB/snmpCollect directory. Each collection consists of a binary datafile, plus a file with the same name preceded by an exclamation mark (!); this file stores the collection information. The data-collection files will grow without bounds. To trim these files without disturbing the collector, delete all files that do not contain an "!" mark.

Clicking on "Only Collect on Nodes with sysObjectID:" allows you to enter a value for sysObjectID. sysObjectID (iso.org.dod.internet.mgmt.mib-2.system.sysObjectID) lets you limit polling to devices made by a specific manufacturer. Its value is the enterprise number the device's manufacturer has registered with IANA. For example, Cisco's enterprise number is 9, and HP's is 11 (the complete list is available at http://www.isi.edu/in-notes/iana/assignments/enterprise-numbers); therefore, to restrict polling to devices manufactured by HP, set the sysObjectID to 11. RFC 1213 formally defines sysObjectID (1.3.6.1.2.1.1.2) as follows:

sysObjectID OBJECT-TYPE
    SYNTAX  OBJECT IDENTIFIER
    ACCESS  read-only
    STATUS  mandatory
    DESCRIPTION
        "The vendor's authoritative identification of the network
         management subsystem contained in the entity. This value
         is allocated within the SMI enterprises subtree (1.3.6.1.4.1) 
         and provides an easy and unambiguous means for determining
         what kind of box' is being managed. For example, if vendor
         'Flintstones, Inc.' was assigned the subtree 1.3.6.1.4.1.4242,
         it could assign the identifier 1.3.6.1.4.1.4242.1.1 to its
         'Fred Router'."
    ::= { system 2 }

The polling interval is the period at which polling occurs. You can use one-letter abbreviations to specify units: "s" for seconds, "m" for minutes, "h" for hours, "d" for days. For example, 32s indicates 32 seconds; 1.5d indicates one and a half days. When I'm designing a data collection, I usually start with a very short polling interval -- typically 7s (7 seconds between each poll). You probably wouldn't want to use a polling interval this short in practice (all the data you collect is going to have to be stored somewhere), but when you're setting up a collection, it's often convenient to use a short polling interval. You don't want to wait a long time to find out whether you're collecting the right data.

The next option is a drop-down menu that specifies what instances should be polled. The options are "All," "From List," and "From Regular Expression." In this case we're polling a scalar item, so we don't have to worry about instances; we can leave the setting to "All" or select "From List" and specify instance "0" (the instance number for all scalar objects). If you're polling a tabular object, you can either specify a comma-separated list of instances or choose the "From Regular Expression" option and write a regular expression that selects the instances you want. Save your changes ("File Figure 9.2.3.1

Save"), and you're done.

9.2.3.2. Creating a threshold

Once you've set all this up, you've configured NNM to periodically collect the status of your printer's paper tray. Now for something more interesting: let's use thresholds to generate some sort of notification when the traffic coming in through one of our network interfaces exceeds a certain level. To do this, we'll look at a Cisco-specific object, locIfInBitsSec (more formally iso.org.dod.internet.private.enterprises.cisco.local.linterfaces.lifTable.lifEntry.locIfInBitsSec), whose value is the five-minute average of the rate at which data arrives at the interface, in bits per second. (There's a corresponding object called locIfOutBitsSec, which measures the data leaving the interface.) The first part of the process should be familiar: start Data Collection and Thresholds by going to the Options menu of NNM; then click on "Edit Figure 9.2.3.2

Add MIB Object." Navigate through the object tree until you get to locIfInBitsSec; click "OK" to get back to the screen shown in Figure 9-8. Specify the IP addresses of the interfaces you want to monitor and set the collection mode to "Store, Check Thresholds"; this allows you to retrieve and view the data at a later time. (I typically turn on the "Store" function so I can verify that the collector is actually working and view any data that has accumulated.) Pick a reasonable polling interval -- again, when you're testing it's reasonable to use a short interval -- then choose which instances you'd like to poll, and you're ready to set thresholds.

The "Threshold" field lets you specify the point at which the value you're monitoring becomes interesting. What "interesting" means is up to you. In this case, let's assume that we're monitoring a T1 connection, with a capacity of 1.544 Mbits/second. Let's say somewhat arbitrarily that we'll start worrying when the incoming traffic exceeds 75% of our capacity. So, after multiplying, we set the threshold to "> 1158000". Of course, network traffic is fundamentally bursty, so we won't worry about a single peak -- but if we have two or three consecutive readings that exceed the threshold, we want to be notified. So let's set "consecutive samples" to 3: that shields us from getting unwanted notifications, while providing ample notification if something goes wrong.

Setting an appropriate consecutive samples value will make your life much more pleasant, though picking the right value is something of an art. Another example is monitoring the /tmp partition of a Unix system. In this case, you may want to set the threshold to ">= 85", the number of consecutive samples to 2, and the poll interval to 5m. This will generate an event when the usage on /tmp exceeds 85% for two consecutive polls. This choice of settings means that you won't get a false alarm if a user copies a large file to /tmp and then deletes the file a few minutes later. If you set consecutive samples to 1, NNM will generate a Threshold event as soon as it notices that /tmp is filling up, even if the condition is only temporary and nothing to be concerned about. It will then generate a Rearm event after the user deletes the file. Since we are really only worried about /tmp filling up and staying full, setting the consecutive threshold to 2 can help reduce the number of false alarms. This is generally a good starting value for consecutive samples, unless your polling interval is very high.

The rearm parameters let us specify when everything is back to normal or is, at the very least, starting to return to normal. This state must occur before another threshold is met. You can specify either an absolute value or a percentage. When monitoring the packets arriving at an interface, you might want to set the rearm threshold to something like 926,400 bits per second (an absolute value that happens to be 60% of the total capacity) or 80% of the threshold (also 60% of capacity). Likewise, if you're generating an alarm when /tmp exceeds 85% of capacity, you might want to rearm when the free space returns to 80% of your 85% threshold (68% of capacity). You can also specify the number of consecutive samples that need to fall below the rearm point before NNM will consider the rearm condition met.

The final option, "Configure Threshold Event," asks what OpenView events you would like to execute for each state. You can leave the default event, or you can refer to Chapter 10, "Traps" for more on how to configure events. The "Threshold" state needs a specific event number that must reside in the HP enterprise. The default Threshold event is OV_DataCollectThresh - 58720263. Note that the Threshold event is always an odd number. The Rearm event is the next number after the Threshold event: in this case, 58720264. To configure events other than the default, click on "Configure Threshold Event" and, when the new menu comes up, add one event (with an odd number) to the HP section and a second event for the corresponding Rearm. After making the additions, save and return to the Collection windows to enter the new number.

When you finish configuring the data collection, click "OK." This brings you back to the Data Collection and Thresholds menu. Click "File Figure 9.2.3.2

Save" to make your current additions active. On the bottom half of the "MIB Object Collection Summary" window, click on your new object and then on "Actions Figure 9.2.3.2

Test SNMP." This brings up a window showing the results of an SNMP test on that collection. After the test, wait long enough for your polling interval to have expired once or twice. Then click on the object collection again, but this time click on "Actions Figure 9.2.3.2

Show Data." This window shows the data that has been gathered so far. Try blasting data through the interface to see if you can trigger a Threshold event. If the Threshold events are not occurring, verify that your threshold and polling intervals are set correctly. After you've seen a Threshold event occur, watch how the Rearm event gets executed. When you're finished testing, go back and set up realistic polling periods, add any additional nodes you would like to poll, and turn off storing if you don't want to collect data for trend analysis. Refer to the $OV_LOG/snmpCol.trace file if you are having any problems getting your data collection rolling. Your HP OpenView manual should describe how to use this trace file to troubleshoot most problems.

Once you have collected some data, you can use xnmgraph to display it. The xnmgraph command to use is similar to the ones we saw earlier; it's an awkward command that you'll want to save in a script. In the following script, the -browse option points the grapher at the stored data:

#!/bin/sh
# filename: /opt/OV/local/scripts/graphSavedData
# syntax: graphSavedData <hostname>
/opt/OV/bin/xnmgraph -c public -title Bits_In_n_Out_For_All_Up_Interfaces \
-browse -mib \
 ".1.3.6.1.4.1.9.2.2.1.1.6:::.1.3.6.1.2.1.2.2.1.8:1:.1.3.6.1.2.1.2.2.1.2:::,\
.1.3.6.1.4.1.9.2.2.1.1.8:::.1.3.6.1.2.1.2.2.1.8:1:.1.3.6.1.2.1.2.2.1.2:::" \ 
$1

Once the graph has started, no real (live) data will be graphed; the display is limited to the data that has been collected. You can click on "File Figure 9.2.3.2

Update Data" to check for and insert any data that has been gathered since the start of the graph. Another option is to leave off -browse, which allows the graph to continue collecting and displaying the live data along with the collected data.

Finally, to graph all the data that has been collected for a specific node, go to NNM and select the node you would like to investigate. Then select "Performance Figure 9.2.3.2

Graph SNMP Data Figure 9.2.3.2

Select Nodes" from the menus. You will get a graph of all the data that has been collected for the node you selected. Alternately, select the "All" option in "Performance Figure 9.2.3.2

Graph SNMP Data." With the number of colors limited to 25, you will usually find that you can't fit everything into one graph.