File locking (Managing NFS and NIS, 2nd Edition)

7.5. File locking

File locking allows one process to gain exclusive access to a file or part of a file, and forces other processes requiring access to the file to wait for the lock to be released. Locking is a stateful operation and does not mesh well with the stateless design of NFS. One of NFS's design goals is to maintain Unix filesystem semantics on all files, which includes supporting record locks on files.

Unix locks come in two flavors: BSD-style file locks and System V-style record locks. The BSD locking mechanism implemented in the flock( ) system call exists for whole file locking only, and on Solaris is implemented in terms of the more general System V-style locks. The System V-style locks are implemented through the fcntl( ) system call and the lockf( ) library routine, which uses fcntl( ). System V locking operations are separated from the NFS protocol and handled by an RPC lock daemon and a status monitoring daemon that recreate and verify state information when either a client or server reboot.

7.5.1. Lock and status daemons

The RPC lock daemon, lockd, runs on both the client and server. When a lock request is made for an NFS-mounted file, lockd forwards the request to the server's lockd. The lock daemon asks the status monitor daemon, statd, to note that the client has requested a lock and to begin monitoring the client.

The file locking daemon and status monitor daemon keep two directories with lock "reminders" in them: /var/statmom/sm and /var/statmon/sm.bak. (On some systems, these directories are /etc/sm and /etc/sm.bak.) The first directory is used by the status monitor on an NFS server to track the names of hosts that have locked one or more of its files. The files in /var/statmon/sm are empty and are used primarily as pointers for lock renegotiation after a server or client crash. When statd is asked to monitor a system, it creates a file with that system's name in /etc/statmon/sm.

If the system making the lock request must be notified of a server reboot, then an entry is made in /var/statmon/sm.bak as well. When the status monitor daemon starts up, it calls the status daemon on all of the systems whose names appear in /var/statmon/sm.bak to notify them that the NFS server has rebooted. Each client's status daemon tells its lock daemon that locks may have been lost due to a server crash. The client-side lock daemons resubmit all outstanding lock requests, recreating the file lock state (on the server) that existed before the server crashed.

7.5.2. Client lock recovery

If the server's statd cannot reach a client's status daemon to inform it of the crash recovery, it begins printing annoying messages on the server's console:

statd: cannot talk to statd at client, RPC: Timed out(5)

These messages indicate that the local statd process could not find the portmapper on the client to make an RPC call to its status daemon. If the client has also rebooted and is not quite back on the air, the server's status monitor should eventually find the client and update the file lock state. However, if the client was taken down, had its named changed, or was removed from the network altogether, these messages continue until statd is told to stop looking for the missing client.

To silence statd, kill the status daemon process, remove the appropriate file in /var/statmon/sm.bak, and restart statd. For example, if server onaga cannot find the statd daemon on client noreaster, remove that client's entry in /var/statmon/sm.bak :

onaga# ps -eaf | fgrep statd 
root   133     1  0   Jan 16 ?        0:00 /usr/lib/nfs/statd
root  8364  6300  0 06:10:27 pts/13   0:00 fgrep statd
onaga# kill -9 133 
onaga# cd /var/statmon/sm.bak 
onaga# ls 
noreaster 
onaga# rm noreaster 
onaga# cd / 
onaga# /usr/lib/nfs/statd

Error messages from statd should be expected whenever an NFS client is removed from the network, or when clients and servers boot at the same time.

7.5.3. Recreating state information

Because permanent state (state that survives crashes) is maintained on the server host owning the locked file, the server is given the job of asking clients to re-establish their locks when state is lost. Only a server crash removes state from the system, and it is missing state that is impossible to regenerate without some external help.

When a client reboots, it by definition has given up all of its locks, but there is no state lost. Some state information may remain on the server and be out-of-date, but this "excess" state is flushed by the server's status monitor. After a client reboot, the server's status daemon notices the inconsistency between the locks held by the server and those the client thinks it holds. It informs the server lockd that locks from the rebooted client need reclaiming. The server's lockd sets a grace period -- 45 seconds by default -- during which the locks must be reclaimed or be lost. When a client reboots, it will not reclaim any locks, because there is no record of the locks in its local lockd. The server releases all of them, removing the old state from the client-server system.

Think of this server-side responsibility as dealing with your checkbook and your local bank branch. You keep one set of records, tracking what your balance is, and the bank maintains its own information about your account. The bank's information is the "truth," no matter how good or bad your recording keeping is. If you vanish from the earth or stop contacting the bank, then the bank tries to contact you for some finite grace period. After that, the bank releases its records and your money. On the other hand, if the bank were to lose its computer records in a disaster, it could ask you to submit checks and deposit slips to recreate the records of your account.