| | |
16.4. Identifying NFS performance bottlenecks
The stateless design of NFS makes crash recovery simple,
but
it also makes it impossible for a client to distinguish between a
server that is slow and one that has crashed. In either case, the
client does not receive an RPC reply before the RPC timeout period
expires. Clients can't tell why a server appears slow, either:
packets could be dropped by the network and never reach the server,
or the server could simply be overloaded. Using NFS performance
figures alone, it is hard to distinguish a slow server from an
unreliable network. Users complain that "the system is
slow," but there are several areas that contribute to system
sluggishness.
An overloaded server responds to all packets that it
enqueues for its
nfsd daemons, perhaps dropping some incoming
packets due to the high load. Those requests that are received
generate a response, albeit a response that arrives sometime after
the client has retransmitted the request. If the network itself is to
blame, then packets may not make it from the client or server onto
the wire, or they may vanish in transit between the two hosts.
16.4.1. Problem areas
The potential bottlenecks in the client-server relationship are:
- Client network interface
-
The client may not be able to transmit or receive
packets due to hardware or
configuration problems at its network interface. We will explore
client-side bottlenecks in Chapter 18, "Client-Side Performance Tuning".
- Network bandwidth
-
An overly congested network slows down
both client transmissions and
server replies. Network partitioning hardware installed to reduce
network saturation adds delays to roundtrip times, increasing the
effective time required to complete an RPC call. If the delays caused
by network congestion are serious, they contribute to RPC timeouts.
We explore network bottlenecks in detail in Chapter 17, "Network Performance Analysis".
- Server network interface
-
A busy server may be so flooded with
packets that it cannot receive all
of them, or it cannot queue the incoming requests in a
protocol-specific structure once the network interface receives the
packet. Interrupt handling limitations can also impact the ability of
the server to pull packets in from the network.
- Server CPU loading
-
NFS is rarely CPU-constrained. Once a server
has an NFS request, it has to
schedule an nfsd thread to have the appropriate
operation performed. If the server has adequate CPU cycles, then the
CPU does not affect server performance. However, if the server has
few free CPU cycles, then scheduling latencies may limit NFS
performance; conversely a system that is providing its maximum NFS
service will not make a good CPU server. CPU loading also affects NIS
performance, since a heavily loaded system is slower to perform NIS
map lookups in response to client requests.
- Server memory usage
-
NFS performance is somewhat related to the
size
of the server's memory, if the server is doing nothing but NFS.
NFS will use either the local disk buffer cache (in systems that do
not have a page-mapped VM system) or free memory to cache disk pages
that have recently been read from disk. Running large processes on an
NFS server hurts NFS performance. As a server runs out of memory and
begins paging, its performance as either an NIS or NFS server
suffers. Disk bandwidth is wasted in a system that is paging local
applications, consumed by page fault handling rather than NFS
requests.
- Server disk bandwidth
-
This area is the most common bottleneck:
the server simply cannot get data
to or from the disks quickly enough. NFS requests tend to be random
in nature, exhibiting little locality of reference for a particular
disk. Many clients mounting filesystems from a server increase the
degree of randomness in the system. Furthermore, NFS is stateless, so
NFS Version 2 write operations on the server must be committed to
disk before the client is notified that the RPC call completed. This
synchronous nature of NFS write operations further impairs
performance, since caching and disk controller ordering will not be
utilized to their fullest extent. NFS Version 3 eases this constraint
with the use of safe asynchronous writes, which are described in
detail in the next section.
- Configuration effects
-
Loosely grouped in this category are
constrictive server kernel configurations, poor disk
balancing, and inefficient mount point naming schemes. With poor
configurations, all services operate properly but inefficiently.
16.4.2. Throughput
The next two sections summarize NFS throughput issues.
16.4.2.1. NFS writes (NFS Version 2 versus NFS Version 3)
Write operations over NFS Version 2
are
synchronous, forcing servers to flush
data to disk [45] before a reply to the NFS client can be generated. This
severely limits the speed at which synchronous write requests can be
generated by the NFS client, since it has to wait for acknowledgment
from the server before it can generate the next request. NFS Version
3 overcomes this limitation by introducing a two-phased commit write
operation. The NFS Version 3 client generates asynchronous write
requests, allowing the server to acknowledge the requests without
requiring it to flush the data to disk. This results in a reduction
of the round-trip time between the client and server, allowing
requests to be sent more quickly. Since the server no longer flushes
the data to disk before it replies, the data may be lost if the
server crashes or reboots unexpectedly. The NFS Version 3 client
assumes the responsibility of recovering from these conditions by
caching a copy of the data. The client must first issue a commit
operation for the data to the server before it can flush its cached
copy of the data. In response to the commit request, the server
either ensures the data has been written to disk and responds
affirmatively, or in the case of a crash, responds with an error
causing the client to synchronously retransmit the cached copy of the
data to the server. In short, the client is still responsible for
holding on to the data until it receives acknowledgment from the
server indicating that the data has been flushed to disk.
For all practical purposes, the NFS Version 3 protocol removes any
limitations on the size of the data block that can be transmitted,
although the data block size may still be limited by the underlying
transport. Most NFS Version 3 implementations use a 32 KB data block
size. The larger NFS writes reduce protocol overhead and disk seek
time, resulting in much higher sequential file access.
16.4.2.2. NFS/TCP versus NFS/UDP
TCP handles retransmissions and flow
control for NFS, requiring only individual
packets to be retransmitted in case of loss, and making NFS practical
over lossy and wide area network practical. In contrast, UDP requires
the whole NFS operation to be retransmitted if one or more packets is
lost, making it impractical over lossy networks. TCP allows read and
write operations to be increased from 8 KB to 32 KB. By default,
Solaris clients will attempt to mount NFS filesystems using NFS
Version 3 over TCP when supported by the server. Note that workloads
that mainly access attributes or consist of short reads will benefit
less from the larger transfer size, and as such you may want to
reduce the default read size block by using the
rsize=n option of the mount
command. This is explored in more detail in Chapter 18, "Client-Side Performance Tuning".
16.4.3. Locating bottlenecks
Given all of the areas in which NFS can break
down, it is hard to pick a starting point for performance analysis.
Inspecting server behavior, for example, may not tell you anything if
the network is overly congested or dropping packets. One approach is
to start with a typical NFS client, and evaluate its view of the
network's services. Tools that examine the local network
interface, the network load perceived by the client, and NFS timeout
and retransmission statistics indicate whether the bulk of your
performance problems are due to the network or the NFS servers.
In this and the next two chapters, we look at performance problems
from excessive server loading to network congestion, and offer
suggestions for easing constraints at each of the problem areas
outlined above. However, you may want to get a rough idea of whether
your NFS servers or your network is the biggest contributor to
performance problems before walking through all diagnostic steps. On
a typical NFS client, use the nfsstat tool to
compare the retransmission and duplicate reply rates:
% nfsstat -rc
Client rpc:
Connection oriented:
calls badcalls badxids timeouts newcreds badverfs
1753584 1412 18 64 0 0
timers cantconn nomem interrupts
0 1317 0 18
Connectionless:
calls badcalls retrans badxids timeouts newcreds
12443 41 334 80 166 0
badverfs timers nomem cantsend
0 4321 0 206
The timeout value indicates the number of NFS
RPC calls that did not complete within the RPC timeout period. Divide
timeout by calls to
determine the retransmission rate for this
client. We'll look at an equation for calculating the maximum
allowable retransmission rate on each client in Section 18.1.3, "Retransmission rate thresholds".
If the client-side RPC counts for timeout and
badxid are close in value, the network is
healthy. Requests are making it to the server but the server cannot
handle them and generate replies before the client's RPC call
times out. The server eventually works its way through the backlog of
requests, generating duplicate replies that increment the
badxid count. In this case, the emphasis should
be on improving server response time.
Alternatively, nfsstat may show that
timeout is large while
badxid is zero or negligible. In this case,
packets are never making it to the server, and the network interfaces
of client and server, as well as the network itself, should be
examined. NFS does not query the lower protocol layers to determine
where packets are being consumed; to NFS the entire RPC and transport
mechanisms are a black box. Note that NFS is like
spray in this regard -- it doesn't
matter whether it's the local host's interface, network
congestion, or the remote host's interface that dropped the
packet -- the packets are simply lost. To eliminate all
network-related effects, you must examine each of these areas.
| | | 16.3. Benchmarking | | 16.5. Server tuning |
|
|