Diskless client boot process (Managing NFS and NIS, 2nd Edition)

8.3. Diskless client boot process

Debugging any sort of diskless client problems requires some knowledge of the boot process. When a diskless client is powered on, it knows almost nothing about its configuration. It doesn't know its hostname, since that's established in the boot scripts that it hasn't run yet. It has no concept of IP addresses, because it has no hosts file or hosts NIS map to read. The only piece of information it knows for certain is its 48-bit Ethernet address, which is in the hardware on the CPU (or Ethernet interface) board. To be able to boot, a diskless client must convert the 48-bit Ethernet address into more useful information such as a boot server name, a hostname, an IP address, and the location of its root and swap filesystems.

8.3.1. Reverse ARP requests

The heart of the boot process is mapping 48-bit Ethernet addresses to IP addresses. The Address Resolution Protocol (ARP) is used to locate a 48-bit Ethernet address for a known IP address. Its inverse, Reverse ARP (or RARP), is used by diskless clients to find their IP addresses given their Ethernet addresses. Servers run the rarpd daemon to accept and process RARP requests, which are broadcast on the network by diskless clients attempting to boot.

IP addresses are calculated in two steps. The 48-bit Ethernet address received in the RARP is used as a key in the /etc/ethers file or ethers NIS map. rarpd locates the hostname associated with the Ethernet address from the ethers database and uses that name as a key into the hosts map to find the appropriate IP address.

For the rarpd daemon to operate correctly, it must be able to get packets from the raw network interface. RARP packets are not passed up through the TCP or UDP layers of the protocol stack, so rarpd listens directly on each network interface (e.g., hme0) device node for RARP requests. Make sure that all boot servers are running rarpd before examining other possible points of failure. The best way to check is with ps, which should show the rarpd process:

% ps -eaf | grep rarpd
    root   274     1  0   Apr 16 ?        0:00 /usr/sbin/in.rarpd -a

Some implementations of rarpd are multithreaded, and some will fork child processes. Solaris rarpd implementations will create a process or thread for each network interface the server has, plus one extra process or thread. The purpose of the extra thread or child process is to act as a delayed responder. Sometimes, rarpd gets a request but decides to delay its response by passing the request to the delayed responder, which waits a few seconds before sending the response. A per-interface rarpd thread/process chooses to send a delayed response if it decides it is not the best candidate to answer the request. To understand how this decision is made, we need to look at the process of converting Ethernet addresses into IP addresses in more detail.

The client broadcasts a RARP request containing its 48-bit Ethernet address and waits for a reply. Using the ethers and hosts maps, any RARP server receiving the request attempts to match it to an IP address for the client. Before sending the reply to the client, the server verifies that it is the best candidate to boot the client by checking the /tftpboot directory (more on this soon). If the server has the client's boot parameters but might not be able to boot the client, it delays sending a reply (by giving the request to the delayed responder daemon) so that the correct server replies first. Because RARP requests are broadcast, they are received and processed in somewhat random order by all boot servers on the network. The reply delay compensates for the time skew in reply generation. The server that thinks it can boot the diskless client immediately sends its reply to the client; other machines may also send their replies a short time later.

You may ask "Why should a host other than the client's boot server answer its RARP request?" After all, if the boot server is down, the diskless client won't be able to boot even if it does have a hostname and IP address. The primary reason is that the "real" boot server may be very loaded, and it may not respond to the RARP request before the diskless client times out. Allowing other hosts to answer the broadcast prevents the client from getting locked into a cycle of sending a RARP request, timing out, and sending the request again. A related reason for having multiple RARP replies is that the RARP packet may be missed by the client's boot server. This is functionally equivalent to the server not replying to the RARP request promptly: if some host does not provide the correct answer, the client continues to broadcast RARP packets until its boot server is less heavily loaded. Finally, RARP is used for other network services as well as for booting diskless clients, so RARP servers must be able to reply to RARP requests whether they are diskless client boot servers or not.

After receiving any one of the RARP replies, the client knows its IP address, as well as the IP address of a boot server (found by looking in the packet returned by the server). In some implementations, a diskless client announces its IP addresses with a message of the form:

Using IP address 192.9.200.1 = C009C801

A valid IP address is only the first step in booting; the client needs to be able to load the boot code if it wants to eventually get a Unix kernel running.

8.3.2. Getting a boot block

A local and remote IP address are all that are needed to download the boot block using a simple file transfer program called tftp (for trivial ftp). This minimal file transfer utility does no user or password checking and is small enough to fit in the boot PROM. Downloading a boot block to the client is done from the server's /tftpboot directory.

The server has no specific knowledge of the architecture of the client issuing a RARP or tftp request. It also needs a mechanism for determining if it can boot the client, using only its IP address -- the first piece of information the client can discern. The server's /tftpboot directory contains boot blocks for each architecture of client support, and a set of symbolic links that point to these boot blocks:

[wahoo]%  ls -l /tftpboot
total 282
lrwxrwxrwx  1 root  root    26 Feb 17 12:43 828D0E09 -> inetboot.sun4u.Solaris_2.7
lrwxrwxrwx  1 root  root    26 Feb 17 12:43 828D0E09.SUN4U -> inetboot.sun4u.Solaris_2.7
lrwxrwxrwx  1 root  root    26 Apr 27 18:14 828D0E0A -> inetboot.sun4u.Solaris_2.7
lrwxrwxrwx  1 root  root    26 Apr 27 18:14 828D0E0A.SUN4U -> inetboot.sun4u.Solaris_2.7
-rw-r--r--  1 root root 129632 Feb 17 12:21 inetboot.sun4u.Solaris_2.7 
lrwxrwxrwx  1 root root      1 Feb 17 12:17 tftpboot -> .

The link names are the IP addresses of the clients in hexadecimal. The first client link -- 828D0E09 -- corresponds to IP address 130.141.14.9:

828D0E09 
Insert dots to put in IP address format: 
82.8D.0E.09 
Convert back to decimal: 
130.141.14.9

Two links exist for each client -- one with the IP address in hexadecimal, and one with the IP address and the machine architecture. The second link is used by some versions of tftpboot that specify their architecture when asking for a boot block. It doesn't hurt to have both, as long as they point to the correct boot block for the client.

The previous section stated that a server delays its response to a RARP request if it doesn't think it's the best candidate to boot the requesting client. The server makes this determination by matching the client IP address to a link in /tftpboot. If the link exists, the server is the best candidate to boot the client; if the link is missing, the server delays its response to allow another server to reply first.

The client gets its boot block via tftp, sending its request to the server that answered its RARP request. When the inetd daemon on the server receives the tftp request, it starts an in.tftpd daemon that locates the right boot file by following the symbolic link representing the client's IP address. The tftpd daemon downloads the boot file to the client. In some implementations, when the client gets a valid boot file, it reports the address of its boot server:

Booting from tftp server at 130.141.14.2 = 828D0E02

It's possible that the first host to reply to the client's RARP request can't boot it -- it may have had valid ethers and hosts map entries for the machine but not a boot file. If the first server chosen by the diskless client does not answer the tftp request, the client broadcasts this same request. If no server responds, the machine complains that it cannot find a tftp server.

The tftpd daemon should be run in secure mode using the -s option. This is usually the default configuration in its /etc/inetd.conf entry:

tftp dgram udp wait root /usr/sbin/in.tftpd in.tftpd -s /tftpboot

The argument after the -s is the directory that tftp uses as its root -- it does a chdir( ) into this directory and then a chroot( ) to make it the root of the filesystem visible to the tftp process. This measure prevents tftp from being used to take any file other than a boot block in tftpboot.

The last directory entry in /tftpboot is a symbolic link to itself, using the current directory entry (.) instead of its full pathname. This symbolic link is used for compatibility with older systems that passed a full pathname to tftp, such as /tftpboot/C009C801.SUN4U. Following the symbolic link effectively removes the /tftpboot component and allows a secure tftp to find the request file in its root directory. Do not remove this symbolic link, or older diskless clients will not be able to download their boot files.

8.3.3. Booting a kernel

Once the boot file is loaded, the diskless client jumps out of its PROM monitor and into the boot code. To do anything useful, boot needs a root and swap filesystem, preferably with a bootable kernel on the root device. To get this information, boot broadcasts a request for boot parameters. The bootparamd RPC server listens for these requests and returns a gift pack filled with the location of the root filesystem, the client's hostname, and the name of the boot server. The filesystem information is kept in /etc/bootparams or in the NIS bootparams map.

The diskless client mounts its root filesystem from the named boot server and boots the kernel image found there. After configuring root and swap devices, the client begins single user startup and sets its hostname, IP addresses, and NIS domain name from information in its /etc files. It is imperative that the names and addresses returned by bootparamd match those in the client's configuration files, which must also match the contents of the NIS maps.

As part of the single user boot, the client mounts its /usr filesystem from the server listed in its /etc/vfstab file. At this point, the client has root and swap filesystems, and looks (to the Unix kernel) no different than a system booting from a local disk. The diskless client executes its boot script files, and eventually enters multi-user mode and displays a login prompt. Any breakdowns that occur after the /usr filesystem is mounted are caused by problems in the boot scripts, not in the diskless client boot process itself.

8.3.4. Managing boot parameters

Every diskless client boot server has an /etc/bootparams file and/or uses a bootparams NIS map. On Solaris, the /etc/nsswitch.conf file's bootparams entry controls whether the information is read from /etc/bootparams, NIS, or both, and in what order.

Here are some suggestions for managing diskless client boot parameters:

Keep the boot parameters in the bootparams map if you are using NIS. Obviously, if your NIS master server is also a diskless client server, it will contain a complete /etc/bootparams file.

If you have diskless clients in more than one NIS domain, make sure you have a separate NIS bootparams map for each domain.

On networks with diskless clients from different vendors, make sure that the format of the boot parameter information used by each vendor is the same. If one system's bootparamd daemon returns a boot parameter packet that cannot be understood by another system, you will not be able to use the NIS bootparams map. We'll look at the problems caused by differing boot parameter packet formats in Section 15.3, "Boot parameter confusion".

Eliminating copies of the boot parameter information on the other servers reduces the chances that you'll have out-of-date information on boot servers after you've made a configuration change.