Chapter 7. Device Monitoring with SNMP

This chapter is about monitoring devices with Simple Network Management Protocol (SNMP). It describes how SNMP can be used to retrieve information from remote systems, to monitor systems, and to alert you to problems. While other network management protocols exist, SNMP is currently the most commonly used. While SNMP has other uses, our primary focus will be on monitoring systems to ensure that they are functioning properly and to collect information when they aren't. The material in this chapter is expanded upon in Chapter 8, "Performance Measurement Tools".

This chapter begins with a brief review of SNMP. This description is somewhat informal but should serve to convey enough of the basic ideas to get you started if you are unfamiliar with SNMP. If you are already familiar with the basic concepts and vocabulary, you can safely skip over this section. Next I describe NET SNMP -- a wonderful tool for learning about SNMP that can be used for many simple tasks. Network monitoring using tkined is next, followed by a few pointers to tools for Microsoft Windows.

7.1. Overview of SNMP

SNMP is a management protocol allowing a management program to communicate, configure, or control remote devices that have embedded SNMP agents. The basic idea behind SNMP is to have a program or agent running on the remote system that you can communicate with over the network. This agent then can monitor systems and collect information. Software on a management station sends messages to the remote agent requesting information or directing it to perform some specific task. While communication is usually initiated by the management station, under certain conditions the agent may send an unsolicited message or trap back to the management station.

SNMP provides a framework for network management. While SNMP is not the only management protocol or, arguably, even the best management protocol, SNMP is almost universal. It has a small footprint, can be implemented fairly quickly, is extensible, is well documented, and is an open standard. It resides at the application level of the TCP/IP protocol suite. On the other hand, SNMP, particularly Version 1, is not a secure protocol; it is poorly suited for real-time applications, and it can return an overwhelming amount of information.

SNMP is an evolving protocol with a confusing collection of abbreviations designating the various versions. Only the major versions are mentioned here. Understanding the major distinctions among versions can be important, because there are a few things you can't do with earlier versions and because of differences in security provided by the different versions. However, the original version, SNMPv1, is still widely used and will be the primary focus of this chapter. Generally, the later versions are backward compatible, so differences in versions shouldn't cause too many operational problems.

The second version has several competing variants. SNMPv2 Classic has been superseded by community-based SNMPv2 or SNMPv2c. Two more secure super-sets of SNMPv2c are SNMPv2u and SNMPv2*. SNMPv2c is the most common of the second versions and is what is usually meant when you see a reference to SNMPv2. SNMPv2 has not been widely adopted, but its use is growing. SNMP-NG or SNMPv3 attempts to resolve the differences between SNMPv2u and SNMPv2*. It is too soon to predict how successful SNMPv3 will be, but it also appears to be growing in popularity.

Although there are usually legitimate reasons for the choice of terms, the nomenclature used to describe SNMP can be confusing. For example, parameters that are monitored are frequently referred to as objects, although variables might have been a better choice and is sometimes used. Basically, objects can be thought of as data structures.

Sometimes, the specialized nomenclature doesn't seem to be worth the effort. For example, SNMP uses community strings to control access. In order to gain access to a device, you must give the community string. If this sounds a lot like a password to you, you are not alone. The primary difference is the way community strings are used. The same community strings are often shared by a group or community of devices, something frowned upon with passwords. Their purpose is more to logically group devices than to provide security.

An SNMP manager, software on a central management platform, communicates with an SNMP agent, software located in the managed device, through SNMP messages. With SNMPv1 there are five types of messages. GET_REQUEST, GET_NEXT_REQUEST, and SET_REQUEST are sent by the manager to the agent to request an action. In the first two cases, the agent is asked to supply information, such as the value of an object. The SET_REQUEST message asks the agent to change the value of an object.

The remaining messages, GET_RESPONSE and TRAP, originate at the agent. The agent replies to the first three messages with the GET_RESPONSE message. In each case, the exchange is initiated by the manager. With the TRAP message, the action is initiated by the agent. Like a hardware interrupt on a computer, the TRAP message is the agent's way of getting the attention of the manager. Traps play an essential role in network management in that they alert you to problems needing attention. Knowing that a device is down is, of course, the first step to correcting the problem. And it always helps to be able to tell a disgruntled user that you are aware of the problem and are working on it. Traps are as close as SNMP gets to real-time processing. Unfortunately, for many network problems (such as a crashed system) traps may not be sent. Even when traps are sent, they could be discarded by a busy router. UDP is the transport protocol, so there is no error detection for lost packets. Figure 7-1 summarizes the direction messages take when traveling between the manager and agent.

Figure 7-1. SNMP messages

For a management station to send a packet, it must know the IP address of the agent, the appropriate community string or password used by the agent, and the name of the identifier for the variable or object referenced. Unfortunately, SNMPv1 is very relaxed about community strings. These are sent in clear text and can easily be captured by a packet sniffer. One of the motivating factors for SNMPv2 was to provide greater security. Be warned, however, SNMPv2c uses plain text community strings.

TIP: Most systems, by default, use public for the read-only community string and private for the read/write community string. When you set up SNMP access on a device, you will be given the opportunity to change these. If you don't want your system to be reconfigurable by anyone on the Internet, you should change these. When communicating with devices, use read-only community strings whenever possible and read/write community strings only when necessary. Use filters to block all SNMP traffic into or out of your network. Most agents will also allow you to restrict which devices you can send and receive SNMP messages to and from. Do this! For simplicity and clarity, the examples in this chapter have been edited to use public and private. These are not the community strings I actually use.

Another advantage to SNMPv2 is that two additional messages have been added. GET_BULK_REQUEST will request multiple pieces of data with a single query, whereas GET_REQUEST generates a separate request for each piece of data. This can considerably improve performance. The other new message, INFORM_REQUEST, allows one manager to send unsolicited information to another.

Collectively, the objects are variables defined in the Management Information Base (MIB). Unfortunately, MIB is an overused term that means slightly different things in different contexts. There are some formal rules for dealing with MIBs -- MIB formats are defined by Structure of Management Information (SMI), the syntax rules for MIB entries are described in Abstract Syntax Notation One (ASN.1), and how the syntax is encoded is given by Basic Encoding Rules (BER). Unless you are planning to delve into the implementation of SNMP or decode hex dumps, you can postpone learning SMI, ASN.1, and BER. And because of the complexity of these rules, I advise against looking at hex dumps. Fortunately, programs like ethereal do a good job of decoding these packets, so I won't discuss these rules in this book.

The actual objects that are manipulated are identified by a unique, authoritative object identifier (OID). Each OID is actually a sequence of integers separated by decimal points, sometimes called dotted notation. For example, the OID for a system's description is 1.3.6.1.2.1.1.1. This OID arises from the standardized organization of all such objects, part of which is shown in Figure 7-2. The actual objects are the leaves of the tree. To eliminate any possibility of ambiguity among objects, they are named by giving their complete path from the root of the tree to the leaf.

Figure 7-2. Partial OID structure

As you can see from the figure, nodes are given both names and numbers. Thus, the OID can also be given by specifying the names of each node or object descriptor. For example, iso.org.dod.internet.mgmt.mib-2.system.sysDescr is the object descriptor that corresponds to the object identifier 1.3.6.1.2.1.1.1. The more concise numerical names are used within the agents and within messages. The nonnumeric names are used at the management station for the convenience of users. Objects are coded directly into the agents and manipulated by object descriptors. While management stations can mechanically handle object descriptors, they must be explicitly given the mappings between object descriptors and object identifiers if you want to call objects by name. This is one role of the MIB files that ship with devices and load onto the management station. These files also tell the management station which identifiers are valid.

As you might guess from Figure 7-2, this is not a randomly created tree. Through the standardization process, a number of identifiers have been specified. In particular, the mib-2 subtree has a number of subtrees or groups of interest. The system group, 1.3.6.1.2.1.1, has nodes used to describe the system such as sysDescr(1), sysObjectID(2), sysUpTime(3), and so on. These should be pretty self-explanatory. Although not shown in the figure, the ip(4) group has a number of objects such as ipForwarding(1), which indicates whether IP packets will be forwarded, and ipDefaultTTL(2), which gives the default TTL when it isn't specified by the transport layer. The ip group also has three tables including the ipRouteTable(20). While this information can be gleaned from RFC 1213, which defines the MIB, several books that present this material in a more accessible form are listed in Appendix B, "Resources and References". Fortunately, there are tools that can be used to investigate MIBs directly.

In addition to standard entries, companies may register private or enterprise MIBs. These have extensions specific to their equipment. Typically, these MIBs must be added to those on the management station if they are not already there. They are usually shipped with the device or can be downloaded over the Internet. Each company registers for a node under the enterprises node (1.3.6.1.4.1). These extensions are under their respective registered nodes.

If you are new to SNMP, this probably seems pretty abstract. Appendix B, "Resources and References" also lists and discusses a number of sources that describe the theory and architecture of SNMP in greater detail. But you should know enough at this point to get started. The best way to come to terms with SNMP and the structure of managed objects is by experimentation, and that requires tools. I will try to clarify some of these concepts as we examine SNMP management tools.

Chapter 7. Device Monitoring with SNMP

Contents:

7.1. Overview of SNMP

Figure 7-1. SNMP messages

Figure 7-2. Partial OID structure