Chapter 8. Performance Measurement Tools

Everything on your network may be working, but using it can still be a frustrating experience. Often, a poorly performing system is worse than a broken system. As a user on a broken system, you know when to give up and find something else to do. And as an administrator, it is usually much easier to identify a component that isn't working at all than one that is still working but performing poorly. In this chapter, we will look at tools and techniques used to evaluate network performance.

This chapter begins with a brief overview of the types of tools available. Then we look at ntop, an excellent tool for watching traffic on your local network. Next, I describe mrtg, rrd, and cricket -- tools for collecting traffic data from remote devices over time. RMON, monitoring extensions to SNMP, is next. We conclude with tools for use on Microsoft Windows systems.

Don't overlook the obvious! Although we will look at tools for measuring traffic, user dissatisfaction is probably the best single indicator of the health of your network. If users are satisfied, you needn't worry about theoretical problems. And if users are screaming at your door, then it doesn't matter what the numbers prove.

8.1. What, When, and Where

Network performance will depend on many things -- on the applications you are using and how they are configured, on the hosts running these applications, on the networking devices, on the structure and design of the network as a whole, and on how these pieces interact with one another. Even though the focus of this chapter is restricted to network performance, you shouldn't ignore the other pieces of the puzzle. Problems may arise from the interaction of these pieces, or a problem with one of the pieces may look like a problem with another piece. A misconfigured or poorly designed application can significantly increase the amount of traffic on a network. For example, Version 1.1 of the HTTP protocol provides for persistent connections that can significantly reduce traffic. Not using this particular feature is unlikely to be a make or break issue. My point is, if you look only at the traffic on a network without considering software configurations, you may seem to have a hardware capacity problem when a simple change in software might lessen the problem and, at a minimum, buy you a little more time.

This chapter will focus on tools used to collect information on network performance. The first step in analyzing performance is measuring traffic. In addition to problem identification and resolution, this should be done as part of capacity planning and capacity management (tuning). Several books listed in Appendix B, "Resources and References" provide general discussions of application and host performance analysis.

Of the issues related to measuring network traffic, the most important ones are what to measure, how often, and where. Although there are no simple answers to any of these questions, what to measure is probably the hardest of the three. It is extremely easy to end up with so much data that you don't have time to analyze it. Or you may collect data that doesn't match your needs or that is in an unusable format. If you keep at it, eventually you will learn from experience what is most useful. Take the time to think about how you will use the data before you begin. Be as goal directed as possible. Just realize that, even with the most careful planning, when faced with a new, unusual problem, you'll probably think of something you wish you had been measuring.

If you are looking at the performance of your system over time, then data at just one point in time will be of little value. You will need to collect data periodically. How often you collect will depend on the granularity or frequency of the events you want to watch. For many tasks, the ideal approach is one that periodically condenses and eventually discards older data.

Unless your network is really unusual, the level of usage will vary with the time of day, the day of the week, and the time of the year. Most performance related problems will be most severe at the busiest times. In telephony, the hour when traffic is heaviest is known as the busy hour, and planning centers around traffic at this time. In a data network, for example, the busy hour may be first thing in the morning when everyone is logging on and checking their email, or it could be at noon when everyone is web surfing over their lunch hour.

Knowing usage patterns can simplify data collection since you'll need to do little collecting when the network is underutilized. Changes in usage patterns can indicate fundamental changes in your network that you'll want to be able to identify and explain. Finally, knowing when your network is least busy should give you an idea of the most convenient times to do maintenance.

I have divided traffic-measurement tools into three rough categories based on where they are used within a network. Tools that allow you to capture traffic coming into or going out of a particular machine are called host-monitoring tools. Tools that place an interface in promiscuous mode and allow you to capture all the traffic at an interface are called point-monitoring tools. Finally, tools that build a global picture of network traffic by querying other hosts (which are in turn running either host-monitoring or point-monitoring tools) are called network-monitoring tools. Both host monitoring and point monitoring should have a minimal impact on network traffic. With the exception of DNS traffic, they shouldn't be generating additional traffic. This is not true for network-monitoring tools.

Because of their roles within a network, devices such as switches and routers don't easily fit into this classification scheme. If a single switch interconnects all devices in a subnet, then it will see all the local traffic. If, however, multiple switches are used and you aren't mirroring traffic, each switch will see only part of the traffic. Routers will see only traffic moving between networks. While this is ideal for measuring traffic between local and remote devices, it is not helpful in understanding strictly local traffic. The problem should be obvious. If you monitor the wrong device, you may easily miss bottlenecks or other problems. Before collecting data, you need to understand the structure of your network so you can understand what traffic is actually being seen. This is one reason the information in Chapter 6, "Device Discovery and Mapping", is important.

Finally, you certainly won't want to deal with raw data on a routine basis. You will want tools that present the data in a useful manner. For time-series data, graphs and summary statistics are usually the best choice.

Chapter 8. Performance Measurement Tools

Contents:

8.1. What, When, and Where