18.2. Soft mount issues
Repeated retransmission cycles only occur
for hard-mounted filesystems. When the
soft option is supplied in a mount, the RPC
retransmission sequence ends at the first major timeout, producing
messages like:
NFS write failed for server wahoo: error 5 (RPC: Timed out)
NFS write error on host wahoo: error 145.
(file handle: 800000 2 a0000 114c9 55f29948 a0000 11494 5cf03971)
The NFS operation that failed is indicated, the server that failed to
respond before the major timeout, and the filehandle of the file
affected. RPC timeouts may be caused by extremely slow servers, or
they can occur if a server crashes and is down or rebooting while an
RPC retransmission cycle is in progress.
With soft-mounted filesystems, you have to worry about damaging data
due to incomplete writes, losing access to the text segment of a
swapped process, and making soft-mounted filesystems more tolerant of
variances in server response time. If a client does not give the
server enough latitude in its response time, the first two problems
impair both the performance and correct operation of the client. If
write operations fail, data consistency on the
server cannot be guaranteed. The write error is reported to the
application during some later call to
write( )
or
close( ), which is consistent with the
behavior of a local filesystem residing on a failing or overflowing
disk. When the actual write to disk is attempted by the kernel device
driver, the failure is reported to the application as an error during
the next similar or related system call.
A well-conditioned application should exit abnormally after a failed
write, or retry the write if possible. If the application ignores the
return code from
write( ) or
close(
), then it is possible to corrupt data on a soft-mounted
filesystem. Some write operations may fail and never be retried,
leaving holes in the open file.
To guarantee data integrity,
all filesystems
mounted read-write should be hard-mounted. Server performance as well
as server reliability determine whether a request eventually succeeds
on a soft-mounted filesystem, and neither can be guaranteed.
Furthermore, any operating system that maps executable images
directly into memory (such as Solaris) should hard-mount filesystems
containing executables. If the filesystem is soft-mounted, and the
NFS server crashes while the client is paging in an executable
(during the initial load of the text segment or to refill a page
frame that was paged out), an RPC timeout will cause the paging to
fail. What happens next is system-dependent; the application may be
terminated or the system may panic with unrecoverable swap errors.
A common objection to hard-mounting filesystems is that NFS clients
remain catatonic until a crashed server recovers, due to the infinite
loop of RPC retransmissions and timeouts. By default, Solaris clients
allow interrupts to break the retransmission loop. Use the
intr mount option if your client doesn't
specify interrupts by default. Unfortunately, some older
implementations of NFS do not process keyboard interrupts until a
major timeout has occurred: with even a small timeout period and
retransmission count, the time required to recognize an interrupt can
be quite large.
If you choose to ignore this advice, and choose to use soft-mounted
NFS filesystems, you should at least make NFS clients more tolerant
of soft-mounted NFS fileservers by increasing the
retrans mount option. Increasing the number of
attempts to reach the server makes the client less likely to produce
an RPC error during
brief periods of server loading.
| | |
18. Client-Side Performance Tuning | | 18.3. Adjusting for network reliability problems |