Adjusting for network reliability problems (Managing NFS and NIS, 2nd Edition)

18.3. Adjusting for network reliability problems

Even a lightly loaded network can suffer from reliability problems if older bridges or routers joining the network segments routinely drop parts of long packet trains. Older bridges and routers are most likely to affect NFS performance if their network interfaces cannot keep up with the packet arrival rates generated by the NFS clients and servers on each side.

Some NFS experts believe it is a bad idea to micro-manage NFS to compensate for network problems, arguing instead that these problems should be handled by the transport layer. We encourage you to use NFS over TCP, and allow the TCP implementation to dynamically adapt to network glitches and unreliable networks. TCP does a much better job of adjusting transfer sizes, handling congestion, and generating retransmissions to compensate for network problems.

Having said this, there may still be times when you choose to use UDP instead of TCP to handle your NFS traffic.[57] In such cases, you will need to determine the impact that an old bridge or router is having on your network. This requires another look at the client-side RPC statistics:

[57]One example is the lack of NFS over TCP support for your client or server.

% nfsstat -rc 
Client rpc:
Connection-oriented:
calls       badcalls    badxids     timeouts    newcreds    badverfs    
1753569     1412        3           64          0           0           
timers      cantconn    nomem       interrupts  
0           1317        0           18          
Connectionless:
calls       badcalls    retrans     badxids     timeouts    newcreds    
12252       41          334         5           166         0           
badverfs    timers      nomem       cantsend    
0           4321        0           206

When timeouts is high and badxid is close to zero, it implies that the network or one of the network interfaces on the client, server, or any intermediate routing hardware is dropping packets. Some older host Ethernet interfaces are tuned to handle page-sized packets and do not reliably handle larger packets; similarly, many older Ethernet bridges cannot forward long bursts of packets. Older routers or hosts acting as IP routers may have limited forwarding capacity, so reducing the number of packets sent for any request reduces the probability that these routers will drop packets that build up behind their network interfaces.

The NFS buffer size determines how many packets are required to send a single, large read or write request. The Solaris default buffer size is 8KB for NFS Version 2 and 32KB for NFS Version 3. Linux [58] uses a default buffer size of 1KB. The buffer size can be negotiated down, at mount time, if the client determines that the server prefers a smaller transfer size.

[58]This refers to Version 2.2.14-5 of the Linux kernel.

Compensating for unreliable networks involves changing the NFS buffer size, controlled by the rsize and wsize mount options. rsize determines how many bytes are requested in each NFS read, and wsize gauges the number of bytes sent in each NFS write operation. Reducing rsize and wsize eases the peak loads on the network by sending shorter packet trains for each NFS request. By spacing the requests out, and increasing the probability that the entire request reaches the server or client intact on the first transmission, the overall load on the network and server is smoothed out over time.

The read and write buffer sizes are specified in bytes. They are generally made multiples of 512 bytes, based on the size of a disk block. There is no requirement that either size be an integer multiple of 512, although using an arbitrary size can make the disk operations on the remote host less efficient. Write operations performed on non-disk block aligned buffers require the NFS server to read the block, modify the block, and rewrite it. The read-modify-write cycle is invisible to the client, but adds to the overhead of each write( ) performed on the server.

These values are used by the NFS async threads and are completely independent of buffer sizes internal to any client-side processes. An application that writes 400-byte buffers, writing to a filesystem mounted with wsize=4096, does not cause an NFS write request to be sent to the server until the 11th write is performed.

Here is an example of mounting an NFS filesystem with the read and write buffer sizes reduced to 2048 bytes:

# mount -o rsize=2048,wsize=2048 wahoo:/export/home /mnt

Decreasing the NFS buffer size has the undesirable effect of increasing the load on the server and sending more packets on the network to read or write a given buffer. The size of the actual packets on the network does not change, but the number of IP packets composing a single NFS buffer decreases as the rsize and wsize are decreased. For example, an 8KB NFS buffer is divided into five IP packets of about 1500 bytes, and a sixth packet with the remaining data bytes. If the write size is set to 2048 bytes, only two IP packets are needed.

The problem lies in the number of packets required to transfer the same amount of data. Table 18-2 shows the number of IP packets required to copy a file for various NFS read buffer sizes.

Table 18-2. IP packets, RPC requests as function of NFS buffer size

File Size	IP Packets/RPC Calls
	rsize	rsize	rsize	rsize
(kbytes)	1024	2048	4096	8192
1	1/1	1/1	1/1	1/1
2	2/2	2/1	2/1	2/1
4	4/4	4/2	3/1	3/1
8	8/8	8/4	6/2	6/1

As the file size increases, transfers with smaller NFS buffer sizes send more IP packets to the server. The number of packets will be the same for 4096- and 8192-byte buffers, but for file sizes over 4K, setting rsize=4096 always requires twice as many RPC calls to the server. The increased network traffic adds to the very problem for which the buffer size change was compensating, and the additional RPC calls further load the server. Due to the increased server load, it is sometimes necessary to increase the RPC timeout parameter when decreasing NFS buffer sizes. Again, we encourage you to use NFS over TCP when possible and avoid having to worry about the NFS buffer sizes.