Contents:
Preparing NFS
Mounting an NFS Volume
The NFS Daemons
The exports File
Kernel-Based NFSv2 Server Support
Kernel-Based NFSv3 Server Support
The Network File System (NFS) is probably the most prominent network service using RPC. It allows you to access files on remote hosts in exactly the same way you would access local files. A mixture of kernel support and user-space daemons on the client side, along with an NFS server on the server side, makes this possible. This file access is completely transparent to the client and works across a variety of server and host architectures.
NFS offers a number of useful features:
-
Data accessed by all users can be kept on a central host, with clients mounting this directory at boot time. For example, you can keep all user accounts on one host and have all hosts on your network mount /home from that host. If NFS is installed beside NIS, users can log into any system and still work on one set of files.
-
Data consuming large amounts of disk space can be kept on a single host. For example, all files and programs relating to LaTeX and METAFONT can be kept and maintained in one place.
-
Administrative data can be kept on a single host. There is no need to use rcp to install the same stupid file on 20 different machines.
It's not too hard to set up basic NFS operation on both the client and server; this chapter tells you how.
Linux NFS is largely the work of Rick Sladkey, who wrote the NFS kernel code and large parts of the NFS server.[] The latter is derived from the unfsd user space NFS server, originally written by Mark Shand, and the hnfs Harris NFS server, written by Donald Becker.
Let's have a look at how NFS works. First, a client tries to mount a directory from a remote host on a local directory just the same way it does a physical device. However, the syntax used to specify the remote directory is different. For example, to mount /home from host vlager to /users on vale, the administrator issues the following command on vale:[]
#
mount -t nfs vlager:/home /users
mount will try to connect to the rpc.mountd mount daemon on vlager via RPC. The server will check if vale is permitted to mount the directory in question, and if so, return it a file handle. This file handle will be used in all subsequent requests to files below /users.
When someone accesses a file over NFS, the kernel places an RPC call to rpc.nfsd (the NFS daemon) on the server machine. This call takes the file handle, the name of the file to be accessed, and the user and group IDs of the user as parameters. These are used in determining access rights to the specified file. In order to prevent unauthorized users from reading or modifying files, user and group IDs must be the same on both hosts.
On most Unix implementations, the NFS functionality of both client and server is implemented as kernel-level daemons that are started from user space at system boot. These are the NFS Daemon (rpc.nfsd) on the server host, and the Block I/O Daemon (biod) on the client host. To improve throughput, biod performs asynchronous I/O using read-ahead and write-behind; also, several rpc.nfsd daemons are usually run concurrently.
The current NFS implementation of Linux is a little different from the classic NFS in that the server code runs entirely in user space, so running multiple copies simultaneously is more complicated. The current rpc.nfsd implementation offers an experimental feature that allows limited support for multiple servers. Olaf Kirch developed kernel-based NFS server support featured in 2.2 Version Linux kernels. Its performance is significantly better than the existing userspace implementation. We'll describe it later in this chapter.
Before you can use NFS, be it as server or client, you must make sure your kernel has NFS support compiled in. Newer kernels have a simple interface on the proc filesystem for this, the /proc/filesystems file, which you can display using cat:
$
cat /proc/filesystems
minix ext2 msdos nodev proc nodev nfs
If nfs is missing from this list, you have to compile your own kernel with NFS enabled, or perhaps you will need to load the kernel module if your NFS support was compiled as a module. Configuring the kernel network options is explained in the "Kernel Configuration" section of Chapter 3, Configuring the Networking Hardware.
The mounting of NFS volumes closely resembles regular file systems. Invoke mount using the following syntax:[]
#
mount -t nfs
nfs_volume local_dir options
nfs_volume is given as remote_host:remote_dir. Since this notation is unique to NFS filesystems, you can leave out the -t nfs option.
There are a number of additional options that you can specify to mount upon mounting an NFS volume. These may be given either following the -o switch on the command line or in the options field of the /etc/fstab entry for the volume. In both cases, multiple options are separated by commas and must not contain any whitespace characters. Options specified on the command line always override those given in the fstab file.
Here is a sample entry from /etc/fstab:
# volume mount point type options news:/var/spool/news /var/spool/news nfs timeo=14,intr
This volume can then be mounted using this command:
#
mount news:/var/spool/news
In the absence of an fstab entry, NFS mount invocations look a lot uglier. For instance, suppose you mount your users' home directories from a machine named moonshot, which uses a default block size of 4 K for read/write operations. You might increase the block size to 8 K to obtain better performance by issuing the command:
#
mount moonshot:/home /home -o rsize=8192,wsize=8192
The list of all valid options is described in its entirety in the nfs(5) manual page. The following is a partial list of options you would probably want to use:
-
rsize=n and wsize=n
-
These specify the datagram size used by the NFS clients on read and write requests, respectively. The default depends on the version of kernel, but is normally 1,024 bytes.
-
timeo=n
-
This sets the time (in tenths of a second) the NFS client will wait for a request to complete. The default value is 7 (0.7 seconds). What happens after a timeout depends on whether you use the hard or soft option.
-
hard
-
Explicitly mark this volume as hard-mounted. This is on by default. This option causes the server to report a message to the console when a major timeout occurs and continues trying indefinitely.
-
soft
-
Soft-mount (as opposed to hard-mount) the driver. This option causes an I/O error to be reported to the process attempting a file operation when a major timeout occurs.
-
intr
-
Allow signals to interrupt an NFS call. Useful for aborting when the server doesn't respond.
Except for rsize and wsize, all of these options apply to the client's behavior if the server should become temporarily inaccessible. They work together in the following way: Whenever the client sends a request to the NFS server, it expects the operation to have finished after a given interval (specified in the timeout option). If no confirmation is received within this time, a so-called minor timeout occurs, and the operation is retried with the timeout interval doubled. After reaching a maximum timeout of 60 seconds, a major timeout occurs.
By default, a major timeout causes the client to print a message to the console and start all over again, this time with an initial timeout interval twice that of the previous cascade. Potentially, this may go on forever. Volumes that stubbornly retry an operation until the server becomes available again are called hard-mounted. The opposite variety, called soft-mounted, generate an I/O error for the calling process whenever a major timeout occurs. Because of the write-behind introduced by the buffer cache, this error condition is not propagated to the process itself before it calls the write function the next time, so a program can never be sure that a write operation to a soft-mounted volume has succeeded at all.
Whether you hard- or soft-mount a volume depends partly on taste but also on the type of information you want to access from a volume. For example, if you mount your X programs by NFS, you certainly would not want your X session to go berserk just because someone brought the network to a grinding halt by firing up seven copies of Doom at the same time or by pulling the Ethernet plug for a moment. By hard-mounting the directory containing these programs, you make sure that your computer waits until it is able to re-establish contact with your NFS server. On the other hand, non-critical data such as NFS-mounted news partitions or FTP archives may also be soft-mounted, so if the remote machine is temporarily unreachable or down, it doesn't hang your session. If your network connection to the server is flaky or goes through a loaded router, you may either increase the initial timeout using the timeo option or hard-mount the volumes. NFS volumes are hard-mounted by default.
Hard mounts present a problem because, by default, the file operations are not interruptible. Thus, if a process attempts, for example, a write to a remote server and that server is unreachable, the user's application hangs and the user can't do anything to abort the operation. If you use the intr option in conjuction with a hard mount, any signals received by the process interrupt the NFS call so that users can still abort hanging file accesses and resume work (although without saving the file).
Usually, the rpc.mountd daemon in some way or other keeps track of which directories have been mounted by what hosts. This information can be displayed using the showmount program, which is also included in the NFS server package:
#
showmount -e moonshot
Export list for localhost: /home <anon clnt> #
showmount -d moonshot
Directories on localhost: /home #
showmount -a moonshot
All mount points on localhost: localhost:/home
If you want to provide NFS service to other hosts, you have to run the rpc.nfsd and rpc.mountd daemons on your machine. As RPC-based programs, they are not managed by inetd, but are started up at boot time and register themselves with the portmapper; therefore, you have to make sure to start them only after rpc.portmap is running. Usually, you'd use something like the following example in one of your network boot scripts:
if [ -x /usr/sbin/rpc.mountd ]; then /usr/sbin/rpc.mountd; echo -n " mountd" fi if [ -x /usr/sbin/rpc.nfsd ]; then /usr/sbin/rpc.nfsd; echo -n " nfsd" fi
The ownership information of the files an NFS daemon provides to its clients usually contains only numerical user and group IDs. If both client and server associate the same user and group names with these numerical IDs, they are said to their share uid/gid space. For example, this is the case when you use NIS to distribute the passwd information to all hosts on your LAN.
On some occasions, however, the IDs on different hosts do not match. Rather than updating the uids and gids of the client to match those of the server, you can use the rpc.ugidd mapping daemon to work around the disparity. Using the map_daemon option explained a little later, you can tell rpc.nfsd to map the server's uid/gid space to the client's uid/gid space with the aid of the rpc.ugidd on the client. Unfortunately, the rpc.ugidd daemon isn't supplied on all modern Linux distributions, so if you need it and yours doesn't have it, you will need to compile it from source.
rpc.ugidd is an RPC-based server that is started from your network boot scripts, just like rpc.nfsd and rpc.mountd:
if [ -x /usr/sbin/rpc.ugidd ]; then /usr/sbin/rpc.ugidd; echo -n " ugidd" fi
Now we'll look at how we configure the NFS server. Specifically, we'll look at how we tell the NFS server what filesystems it should make available for mounting, and the various parameters that control the access clients will have to the filesystem. The server determines the type of access that is allowed to the server's files. The /etc/exports file lists the filesystems that the server will make available for clients to mount and use.
By default, rpc.mountd disallows all directory mounts, which is a rather sensible attitude. If you wish to permit one or more hosts to NFS-mount a directory, you must export it, that is, specify it in the exports file. A sample file may look like this:
# exports file for vlager /home vale(rw) vstout(rw) vlight(rw) /usr/X11R6 vale(ro) vstout(ro) vlight(ro) /usr/TeX vale(ro) vstout(ro) vlight(ro) / vale(rw,no_root_squash) /home/ftp (ro)
Each line defines a directory and the hosts that are allowed to mount it. A hostname is usually a fully qualified domain name but may additionally contain the * and ? wildcards, which act the way they do with the Bourne shell. For instance, lab*.foo.com
matches lab01.foo.com as well as laboratory.foo.com. The host may also be specified using an IP address range in the form address/netmask. If no hostname is given, as with the /home/ftp directory in the previous example, any host matches and is allowed to mount the directory.
When checking a client host against the exports file, rpx.mountd looks up the client's hostname using the gethostbyaddr call. With DNS, this call returns the client's canonical hostname, so you must make sure not to use aliases in exports. In an NIS environment the returned name is the first match from the hosts database, and with neither DNS or NIS, the returned name is the first hostname found in the hosts file that matches the client's address.
The hostname is followed by an optional comma-separated list of flags, enclosed in parentheses. Some of the values these flags may take are:
-
secure
-
This flag insists that requests be made from a reserved source port, i.e., one that is less than 1,024. This flag is set by default.
-
insecure
-
This flag reverses the effect of the secure flag.
-
ro
-
This flag causes the NFS mount to be read-only. This flag is enabled by default.
-
rw
-
This option mounts file hierarchy read-write.
-
root_squash
-
This security feature denies the superusers on the specified hosts any special access rights by mapping requests from uid 0 on the client to the uid 65534 (that is, -2) on the server. This uid should be associated with the user nobody.
-
no_root_squash
-
Don't map requests from uid 0. This option is on by default, so superusers have superuser access to your system's exported directories.
-
link_relative
-
This option converts absolute symbolic links (where the link contents start with a slash) into relative links. This option makes sense only when a host's entire filesystem is mounted; otherwise, some of the links might point to nowhere, or even worse, to files they were never meant to point to. This option is on by default.
-
link_absolute
-
This option leaves all symbolic links as they are (the normal behavior for Sun-supplied NFS servers).
-
map_identity
-
This option tells the server to assume that the client uses the same uids and gids as the server. This option is on by default.
-
map_daemon
-
This option tells the NFS server to assume that client and server do not share the same uid/gid space. rpc.nfsd then builds a list that maps IDs between client and server by querying the client's rpc.ugidd daemon.
-
map_static
-
This option allows you to specify the name of a file that contains a static map of uids and gids. For example, map_static=/etc/nfs/vlight.map
would specify the /etc/nfs/vlight.map file as a uid/gid map. The syntax of the map file is described in the exports(5) manual page.
-
map_nis
-
This option causes the NIS server to do the uid and gid mapping.
-
anonuid and anongid
-
These options allow you to specify the uid and gid of the anonymous account. This is useful if you have a volume exported for public mounts.
Any error in parsing the exports file is reported to syslogd's daemon facility at level notice whenever rpc.nfsd or rpc.mountd is started up.
Note that hostnames are obtained from the client's IP address by reverse mapping, so the resolver must be configured properly. If you use BIND and are very security conscious, you should enable spoof checking in your host.conf file. We discuss these topics in Chapter 6, Name Service and Resolver Configuration.
The user-space NFS server traditionally used in Linux works reliably but suffers performance problems when overworked. This is primarily because of the overhead the system call interface adds to its operation, and because it must compete for time with other, potentially less important, user-space processes.
The 2.2.0 kernel supports an experimental kernel-based NFS server developed by Olaf Kirch and further developed by H.J. Lu, G. Allan Morris, and Trond Myklebust. The kernel-based NFS support provides a significant boost in server performance.
In current release distributions, you may find the server tools available in prepackaged form. If not, you can locate them at http://csua.berkeley.edu/~gam3/knfsd/. You need to build a 2.2.0 kernel with the kernel-based NFS daemon included in order to make use of the tools. You can check if your kernel has the NFS daemon included by looking to see if the /proc/sys/sunrpc/nfsd_debug file exists. If it's not there, you may have to load the rpc.nfsd module using the modprobe utility.
The kernel-based NFS daemon uses a standard /etc/exports configuration file. The package supplies replacement versions of the rpc.mountd and rpc.nfsd daemons that you start much the same way as their userspace daemon counterparts.
The version of NFS that has been most commonly used is NFS Version 2. Technology has rolled on ahead and it has begun to show weaknesses that only a revision of the protocol could overcome. Version 3 of the Network File System supports larger files and filesystems, adds significantly enhanced security, and offers a number of performance improvements that most users will find useful.
Olaf Kirch and Trond Myklebust are developing an experimental NFSv3 server. It is featured in the developer Version 2.3 kernels and a patch is available against the 2.2 kernel source. It builds on the Version 2 kernel-based NFS daemon.
The patches are available from the Linux Kernel based NFS server home page at http://csua.berkeley.edu/~gam3/knfsd/.