Chapter 17. Network Performance Analysis
This chapter explores network diagnostics and partitioning schemes aimed at reducing congestion and improving the local host's interface to the network.
Contents:Network congestion and network interfaces
Network partitioning hardware
Impact of partitioning
17.1. Network congestion and network interfacesA network that was designed to ensure transparent access to filesystems and to provide "plug-and-play" services for new clients is a prime candidate for regular expansion. Joining several independent networks with routers, switches, hubs, bridges, or repeaters may add to the traffic level on one or more of the networks. However, a network cannot grow indefinitely without eventually experiencing congestion problems. Therefore, don't grow a network without planning its physical topology (cable routing and limitations) as well as its logical design. After several spurts of growth, performance on the network may suffer due to excessive loading.
The problems discussed in this section affect NIS as well as NFS service. Adding network partitioning hardware affects the transmission of broadcast packets, and poorly placed bridges, switches, or routers can create new bottlenecks in frequently used network "virtual circuits." Throughout this chapter, the emphasis will be on planning and capacity evaluation, rather than on low-level electrical details.
17.1.1. Local network interfaceEthernet cabling problems, such as incorrect or poorly made Category-5 cabling, affect all of the machines on the network. Conversely, a local interface problem is visible only to the machine suffering from it. An Ethernet interface device driver that cannot handle the packet traffic is an example of such a local interface problem.
The first three columns show the network interface, the maximum transmission unit (MTU) for that interface, and the network to which the interface is connected. The Address column shows the local IP address (the hostname would have been shown had we not specified -n). The last five columns contain counts of the total number of packets sent and received, as well as errors encountered while handling packets. The collision count indicates the number of times a collision occurred when this host was transmitting.% netstat -in Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis Queue lo0 8232 127.0.0.0 127.0.0.1 7188 0 7188 0 0 0 hme0 1500 126.96.36.199 188.8.131.52 139478 11 102155 0 3055 0
Input errors can be caused by:
Ideally, both the input and output error rates should be as close to zero as possible, although some short bursts of errors may occur as cables are unplugged and reconnected, or during periods of intense network traffic. After a power failure, for example, the flood of packets from every diskless client that automatically reboots may generate input errors on the servers that attempt to boot all of them in parallel. During normal operation, an error rate of more than a fraction of 1% deserves investigation. This rate seems incredibly small, but consider the data rates on a Fast Ethernet: at 100 Mb/sec, the maximum bandwidth of a network is about 150,000 minimum-sized packets each second. An error rate of 0.01% means that fifteen of those 150,000 packets get damaged each second. Diagnosis and resolution of low-level electrical problems such as CRC errors is beyond the scope of this book, although such an effort should be undertaken if high error rates are persistent.
17.1.2. Collisions and network saturationEthernet is similar to an old party-line telephone: everybody listens at once, everybody talks at once, and sometimes two talkers start at the same time. In a well-conditioned network, with only two hosts on it, it's possible to use close to the maximum network's bandwidth. However, NFS clients and servers live in a burst-filled environment, where many machines try to use the network at the same time. When you remove the well-behaved conditions, usable network bandwidth decreases rapidly.
On the Ethernet, a host first checks for a transmission in progress on the network before attempting one of its own. This process is known as carrier sense. When two or more hosts transmit packets at exactly the same time, neither can sense a carrier, and a collision results. Each host recognizes that a collision has occurred, and backs off for a period of time, t, before attempting to transmit again. For each successive retransmission attempt that results in a collision, t is increased exponentially, with a small random variation. The variation in back-off periods ensures that machines generating collisions do not fall into lock step and seize the network.
As machines are added to the network, the probability of a collision increases. Network utilization is measured as a percentage of the ideal bandwidth consumed by the traffic on the cable at the point of measurement. Various levels of utilization are usually compared on a logarithmic scale. The relative decrease in usable bandwidth going from 5% utilization to 10% utilization, is about the same as going from 10% all the way to 30% utilization.
Measuring network utilization requires a LAN analyzer or similar device. Instead of measuring the traffic load directly, you can use the average collision rate as seen by all hosts on the network as a good indication of whether the network is overloaded or not. The collision rate, as a percentage of output packets, is one of the best measures of network utilization. The Collis field in the output of netstat -in shows the number of collisions:
The collision rate for a host is the number of collisions seen by that host divided by the number of packets it writes, as shown in Figure 17-1.% netstat -in Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis Queue lo0 8232 127.0.0.0 127.0.0.1 7188 0 7188 0 0 0 hme0 1500 184.108.40.206 220.127.116.11 139478 11 102155 0 3055 0
Figure 17-1. Collision rate calculationCollisions are counted only when the local host is transmitting; the collision rate experienced by the host is dependent on its network usage. Because network transmissions are random events, it's possible to see small numbers of collisions even on the most lightly loaded networks. A collision rate upwards of 5% is the first sign of network loading, and it's an indication that partitioning the network may be advisable.
Copyright © 2002 O'Reilly & Associates. All rights reserved.