Troubleshooting locking problems (Managing NFS and NIS, 2nd Edition)

11.3. Troubleshooting locking problems

Lock problems will be evident when an NFS client tries to lock a file, and it fails because someone has it locked. For applications that share access to files, the expectation is that locks will be short-lived. Thus, the pattern your users will notice when something is awry is that yesterday an application started up quite quickly, but today it hangs. Usually it is because an NFS/NLM client holds a lock on a file that your application needs to lock, and the holding client has crashed.

11.3.1. Diagnosing NFS lock hangs

On Solaris, you can use tools like pstack and truss to verify that processes are hanging in a lock request:

client1% ps -eaf | grep SuperApp
     mre 23796 10031  0 11:13:22 pts/6    0:00 SuperApp
client1% pstack 23796
23796:  SuperApp
 ff313134 fcntl    (1, 7, ffbef9dc)
 ff30de48 fcntl    (1, 7, ffbef9dc, 0, 0, 0) + 1c8
 ff30e254 lockf    (1, 1, 0, 2, ff332584, ff2a0140) + 98
 0001086c main     (1, ffbefac4, ffbefacc, 20800, 0, 0) + 1c
 00010824 _start   (0, 0, 0, 0, 0, 0) + dc
client1% truss -p 23796
fcntl(1, F_SETLKW, 0xFFBEF9DC)  (sleeping...)

This verifies that the application is stuck in a lock request. We can use pfiles to see what is going on with the files of process 23796:

client1% pfiles 23796
pfiles 23796
23796:  SuperApp
  Current rlimit: 256 file descriptors
   0: S_IFCHR mode:0620 dev:136,0 ino:37990 uid:466 gid:7 rdev:24,37
      O_RDWR
   1: S_IFREG mode:0644 dev:208,1823 ino:5516985 uid:466 gid:300 size:0
      O_WRONLY|O_LARGEFILE
      advisory write lock set by process 3242
   2: S_IFCHR mode:0620 dev:136,0 ino:37990 uid:466 gid:7 rdev:24,37
      O_RDWR

That we are told that there is an advisory lock set on file descriptor 1 that is set by another process, process ID 3242, is useful, but unfortunately it doesn't tell us if 3242 is a local process or a process on another NFS client or NFS server. We also aren't told if the file mapped to file descriptor 1 is a local file, or an NFS file. We are, however, told that the major and minor device numbers of the filesystem are 208 and 1823 respectively. If you run the mount command without any arguments, this dumps the list of mounted file systems. You should see a display similar to:

/ on /dev/dsk/c0t0d0s0 read/write/setuid/intr/largefiles/onerror=panic/dev=2200000 
on Thu Dec 21 11:13:33 2000
/usr on /dev/dsk/c0t0d0s6 read/write/setuid/intr/largefiles/onerror=panic/
dev=2200006 on Thu Dec 21 11:13:34 2000
/proc on /proc read/write/setuid/dev=31c0000 on Thu Dec 21 11:13:29 2000
/dev/fd on fd read/write/setuid/dev=32c0000 on Thu Dec 21 11:13:34 2000
/etc/mnttab on mnttab read/write/setuid/dev=3380000 on Thu Dec 21 11:13:35 2000
/var on /dev/dsk/c0t0d0s7 read/write/setuid/intr/largefiles/onerror=panic/
dev=2200007 on Thu Dec 21 11:13:40 2000
/home/mre on spike:/export/home/mre remote/read/write/setuid/intr/dev=340071f on 
Thu Dec 28 08:51:30 2000

The numbers after dev= are in hexadecimal. Device numbers are constructed by taking the major number, shifting it left several bits, and then adding the minor number. Convert the minor number 1823 to hexadecimal, and look for it in the mount table:

client1% printf "%x\n" 1823
71f
client1% mount | grep 'dev=.*71f'
/home/mre on spike:/export/home/mre remote/read/write/setuid/intr/dev=340071f on 
Thu Dec 28 08:51:30 2000

We now know four things:

This is an NFS file we are blocking on.
The NFS server name is spike.
The filesystem on the server is /export/home/mre.
The inode number of the file is 5516985.

One obvious cause you should first eliminate is whether the NFS server spike has crashed or not. If it hasn't crashed, then the next step is to examine the server.

11.3.2. Examining lock state on NFS/NLM servers

Solaris and other System V-derived systems have a useful tool called crash for analyzing system state. Crash actually reads the Unix kernel's memory and formats its data structures in a more human readable form. Continuing with the example from Section 11.3.1, "Diagnosing NFS lock hangs", assuming /export/home/mre is a directory on a UFS filesystem, which can be verified by doing:

spike# df -F ufs | grep /export
/export               (/dev/dsk/c0t0d0s7 ):  503804 blocks   436848 files

then you can use crash to get more lock state.

The crash command is like a shell, but with internal commands for examining kernel state. The internal command we will be using is lck :

spike# crash
dumpfile = /dev/mem, namelist = /dev/ksyms, outfile = stdout
> lck
Active and Sleep Locks:
INO         TYP  START END     PROC  PID  FLAGS STATE   PREV     NEXT     LOCK
30000c3ee18  w   0      0       13   136   0021 3       48bf0f8  ae9008   6878d00 
30000dd8710  w   0      MAXEND  17   212   0001 3       8f1a48   8f02d8   8f0e18  
30001cce1c0  w   193    MAXEND  -1   3242  2021 3       6878850  c43a08   2338a38 

Summary From List:
 TOTAL    ACTIVE  SLEEP
   3      3       0
>

An important field is PROC. PROC is the "slot" number of the process. If it is -1, that indicates that the lock is being held by a nonlocal (i.e., an NFS client) process, and the PID field thus indicates the process ID, relative to the NFS client. In the sample display, we see one such entry:

30001cce1c0  w   193    MAXEND  -1   3242  2021 3       6878850  c43a08   2338a38

Note that the process id, 3242, is equal to that which the pfiles command displayed earlier in this example. We can confirm that this lock is for the file in question via crash's uinode command:

> uinode 30001cce1c0
UFS INODE MAX TABLE SIZE = 34020
ADDR         MAJ/MIN   INUMB  RCNT LINK   UID   GID    SIZE    MODE  FLAGS
30001cce1c0  136,  7   5516985   2    1   466   300    403  f---644  mt rf
>

The inode numbers match what pfiles earlier displayed on the NFS client. However, inode numbers are unique per local filesystem. We can make doubly sure this is the file by comparing the major and minor device numbers from the uinode command, 136 and 7, with that of the filesystem that is mounted on /export :

spike# ls -lL /dev/dsk/c0t0d0s7
brw-------   1 root     sys      136,  7 May  6  2000 /dev/dsk/c0t0d0s7
spike#

11.3.3. Clearing lock state

Continuing with our example from Section 11.3.2, "Examining lock state on NFS/NLM servers", at this point we know that the file is locked by another NFS client. Unfortunately, we don't know which client it is, as crash won't give us that information. We do however have a potential list of clients in the server's /var/statmon/sm directory:

spike# cd /var/statmon/sm
spike# ls
client1       ipv4.10.1.0.25  ipv4.10.1.0.26  gonzo      java

The entries prefixed with ipv4 are just symbolic links to other entries. The non-symbolic link entries identify the hosts we want to check for.

The most likely cause of the lock not getting released is that the holding NFS client has crashed. You can take the list of hosts from the /var/statmon/sm directory and check if any are dead, or not responding due to a network partition. Once you determine which are dead, you can use Solaris's clear_locks command to clear lock state. Let's suppose you determine that gonzo is dead. Then you would do:

spike# clear_locks gonzo

If clearing the lock state of dead clients doesn't fix the problem, then perhaps a now-live client crashed, but for some reason after it rebooted, its status monitor did not send a notification to the NLM server's status monitor. You can log onto the live clients and check if they are currently mounting the filesystem from the server (in our example, spike:/export). If they are not, then you should consider using clear_locks to clear any residual lock state those clients might have had.

Ultimately, you may be forced to reboot your server. Short of that there are other things you could do. Since you know the inode number and filesystem of file in question, you can determine the file's name:

spike# cd /export
find . -inum 5516985 -print
./home/mre/database

You could rename file database to something else, and copy it back to a file named database. Then kill and restart the SuperApp application on client1. Of course, such an approach requires intimate knowledge or experience with the application to know if this will be safe.