Caching (Managing NFS and NIS, 2nd Edition)

7.4.2. Client data caching

In the previous section, we looked at the async thread's management of an NFS client's buffer cache. The async threads perform read-ahead and write-behind for the NFS client processes. We also saw how NFS moves data in NFS buffers, rather than in page- or buffer cache-sized chunks. The use of NFS buffers allows NFS operations to utilize some of the sequential disk I/O optimizations of Unix disk device drivers.

Reading in buffers that are multiples of the local filesystem block size allows NFS to reduce the cost of getting file blocks from a server. The overhead of performing an RPC call to read just a few bytes from a file is significant compared to the cost of reading that data from the server's disk, so it is to the client's and server's advantage to spread the RPC cost over as many data bytes as possible. If an application sequentially reads data from a file in 128-byte buffers, the first read operation brings over a full (8 kilobytes for NFS Version 2, usually more for NFS Version 3) buffer from the filesystem. If the file is less than the buffer size, the entire file is read from the NFS server. The next read( ) picks up data that is in the buffer (or page) cache, and following reads walk through the entire buffer. When the application reads data that is not cached, another full NFS buffer is read from the server. If there are async threads performing read-ahead on the client, the next buffer may already be present on the NFS client by the time the process needs data from it. Performing reads in NFS buffer-sized operations improves NFS performance significantly by decoupling the client application's system call buffer size and the VFS implementation's buffer size.

Going the other way, small write operations to the same file are buffered until they fill a complete page or buffer. When a full buffer is written, the operating system gives it to an async thread, and async threads try to cluster write buffers together so they can be sent in NFS buffer-sized requests. The eventual write RPC call is performed synchronous to the async thread; that is, the async thread does not continue execution (and start another write or read operation) until the RPC call completes. What happens on the server depends on what version of NFS is being used.

For NFS Version 2, the write RPC operation does not return to the client's async thread until the file block has been committed to stable, nonvolatile storage. All write operations are performed synchronously on the server to ensure that no state information is left in volatile storage, where it would be lost if the server crashed.

For NFS Version 3, the write RPC operation typically is done with the stable flag set to off. The server will return as soon as the write is stored in volatile or nonvolatile storage. Recall from Section 7.2.6, "NFS Version 3" that the client can later force the server to synchronously write the data to stable storage via the commit operation.

There are elements of a write-back cache in the async threads. Queueing small write operations until they can be done in buffer-sized RPC calls leaves the client with data that is not present on a disk, and a client failure before the data is written to the server would leave the server with an old copy of the file. This behavior is similar to that of the Unix buffer cache or the page cache in memory-mapped systems. If a client is writing to a local file, blocks of the file are cached in memory and are not flushed to disk until the operating system schedules them. If the machine crashes between the time the data is updated in a file cache page and the time that page is flushed to disk, the file on disk is not changed by the write. This is also expected of systems with local disks -- applications running at the time of the crash may not leave disk files in well-known states.

Having file blocks cached on the server during writes poses a problem if the server crashes. The client cannot determine which RPC write operations completed before the crash, violating the stateless nature of NFS. Writes cannot be cached on the server side, as this would allow the client to think that the data was properly written when the server is still exposed to losing the cached request during a reboot.

Ensuring that writes are completed before they are acknowledged introduces a major bottleneck for NFS write operations, especially for NFS Version 2. A single Version 2 file write operation may require up to three disk writes on the server to update the file's inode, an indirect block pointer, and the data block being written. Each of these server write operations must complete before the NFS write RPC returns to the client. Some vendors eliminate most of this bottleneck by committing the data to nonvolatile, nondisk storage at memory speeds, and then moving data from the NFS write buffer memory to disk in large (64 kilobyte) buffers. Even when using NFS Version 3, the introduction of nonvolatile, nondisk storage can improve performance, though much less dramatically than with NFS Version 2.

Using the buffer cache and allowing async threads to cluster multiple buffers introduces some problems when several machines are reading from and writing to the same file. To prevent file inconsistency with multiple readers and writers of the same file, NFS institutes a flush-on-close policy:

All partially filled NFS buffers are written to the NFS server when a file is closed.
For NFS Version 3 clients, any writes that were done with the stable flag set to off are forced onto the server's stable storage via the commit operation.

This ensures that a process on another NFS client sees all changes to a file that it is opening for reading:

Client A	Client B
`open( )`
`write( )`
NFS Version 3 only: commit
`close( )`
	`open( )`
	`read( )`

The read( ) system call on Client B will see all of the data in a file just written by Client A, because Client A flushed out all of its buffers for that file when the close( ) system call was made. Note that file consistency is less certain if Client B opens the file before Client A has closed it. If overlapping read and write operations will be performed on a single file, file locking must be used to prevent cache consistency problems. When a file has been locked, the use of the buffer cache is disabled for that file, making it more of a write-through than a write-back cache. Instead of bundling small NFS requests together, each NFS write request for a locked file is sent to the NFS server immediately.

7.4. Caching

7.4.1. File attribute caching

7.4.2. Client data caching

7.4.3. Server-side caching