13.3. Remote procedure call toolsNetwork failures on a grand scale are generally caused by problems at the MAC or IP level, and are immediately noticed by users. Problems involving higher layers of the network protocol stack manifest themselves in more subtle ways, affecting only a few machines or particular pairs of clients and servers. The utilities discussed in the following sections analyze functionality from the remote procedure call (RPC) layer up through the NFS or NIS application layer. The next section contains a detailed examination of the RPC mechanism at the heart of NFS and NIS.
13.3.1. RPC mechanicsThe Remote Procedure Call (RPC) mechanism imposes a client/server relationship on machines in a network. A server is a host that physically owns some shared resource, such as a disk exported for NFS service or an NIS map. Clients operate on resources owned by servers by making RPC requests; these operations appear (to the client) to have been executed locally. For example, when performing a read RPC on an NFS-mounted disk, the reading application has no knowledge of where the read is actually executed. Many client-server relationships may be defined for each machine on a network; a server for one resource is often a client for many others in the same network.
220.127.116.11. Identifying RPC servicesServices available through RPC are identified by four values:
Note that program 100005, mountd, has two names, reflecting the fact that the mountd daemon services both mount requests and the showmount utility. Program numbers can also be expressed in hexadecimal. Well-known RPC services such as NFS and NIS are assigned reserved program numbers in the range 0x0 to 0x199999. Numbers above this range may be assigned to local applications such as license servers. The well-known programs are commonly expressed in decimal, though. A version number is used to differentiate between various flavors of the same service, and is mostly utilized to evolve the service over time, while providing backwards compatibility if so desired. For example, there are two versions of the NFS service: Versions 2 and 3 (there is no Version 1). Each version of the program may be composed of many procedures. Each version of the NFS service, program number 100003, consists of several procedures, each of which is assigned a procedure number. These procedures perform client requests on the NFS server. For example: read a directory, create a file, read a block from a file, write to a file, get the file's attributes, or get statistics about a filesystem. The procedure number is passed in an RPC request as an "op code" for the RPC server. Procedure numbers start with 1; procedure 0 is reserved for a "null" function. While RPC program numbers are well-advertised, version and procedure numbers are particular to the service and often are contained in a header file that gets compiled into the client program. NFS procedure numbers, for example, are defined in the header files /usr/include/nfs/nfs.h. RPC clients and servers deal exclusively with RPC program numbers. At the session layer in the protocol stack, the code doesn't really care what protocols are used to provide the session services. The UDP and TCP transport protocols need port numbers to identify the local and remote ends of a connection. The portmapper is used to perform translation between the RPC program number-based view of the world and the TCP/UDP port numbers.Excerpt from /etc/rpc: nfs 100003 nfsprog ypserv 100004 ypprog mountd 100005 mount showmount ypbind 100007
18.104.22.168. RPC portmapper -- rpcbindThe rpcbind daemon (also known as the portmapper), exists to register RPC services and to provide their IP port numbers when given an RPC program number. rpcbind itself is an RPC service, but it resides at a well-known IP port (port 111) so that it may be contacted directly by remote hosts. For example, if host fred needs to mount a filesystem from host barney, it must send an RPC request to the mountd daemon on barney. The mechanics of making the RPC request are as follows:
The rpcbind daemon and the old portmapper provide the same RPC service. The portmapper implements Version 2 of the portmap protocol (RPC program number 100000), where the rpcbind daemon implements Versions 3 and 4 of the protocol, in addition to Version 2. This means that the rpcbind daemon already implements the functionality provided by the old portmapper. Due to this overlap in functionality and to add to the confusion, many people refer to the rpcbind daemon as the portmapper.
22.214.171.124. RPC version numbersAs mentioned before, each new implementation of an RPC server has its own version number. Different version numbers are used to coordinate multiple implementations of the same service, each of which may have a different interface. As an RPC service matures, the service's author may find it necessary to add new procedures or add arguments to existing procedures. Changing the interface in this way requires incrementing the version number. The first (and earliest) version of an RPC program is version 1; subsequent releases of the server should use consecutive version numbers. For example, the mount service has several versions, each one supporting more options than its predecessors. Multiple versions are implemented in a single server process; there doesn't need to be a separate instance of the RPC server daemon for each version supported. Each RPC server daemon registers its RPC program number and all versions it supports with the portmapper. It is helpful to think of dispatching a request through an RPC server as a two-level switch: the first level discriminates on the version number, and chooses a set of procedure routines comprising that version of the RPC service. The second level dispatch invokes one of the routines in that set based on the program number in the RPC request. When contacting the portmapper on a remote host, the local and remote sides must agree on the version number of the RPC service that will be used. The rule of thumb is to use the highest-numbered version that both parties understand. In cases where version numbers are not consecutively numbered, or no mutually agreeable version number can be found, the portmapper returns a version mismatch error looking like:
Even though Solaris supports Transport-Independent RPC (TI-RPC), in reality most RPC services use the TCP, UDP and loopback transport protocols. Servers may register themselves for any of the protocols, depending upon the varieties of connections they need to support. UDP packets are unreliable and unsequenced and are often used for broadcast or stateless services. The RPC server for the spray utility, which "catches" packets thrown at the remote host, uses the UDP protocol to accept as many requests as it can without requiring retransmission of any missed packets. In contrast to UDP, TCP packets are reliably delivered and are presented in the order in which they were transmitted, making them a requirement when requests must be processed by the server in the order in which they were transmitted by the client. The loopback transports are used for communication within the local host and can be connection-less or connection-oriented. For example, the automounter daemon uses RPC over a connection-oriented loopback transport to communicate with the local kernel. RPC servers listen on the ports they have registered with the portmapper, and are used repeatedly for short-lived sessions. Connections to an RPC server may exist for the duration of the RPC call only, or may remain across calls. They do not usually fork new processes for each request, since the overhead of doing so would significantly impair the performance of RPC-intensive services such as NFS. Many RPC servers are multithreaded, such as NFS in Solaris, which allows the server to have multiple NFS requests being processed in parallel. A multithreaded NFS server can take advantage of multiple disks and disk controllers, it also allows "fast" NFS requests such as attribute or name lookups to not get trapped behind slower disk requests.mount: RPC: Program version mismatch
13.3.2. RPC registrationMaking RPC calls is a reasonably complex affair because there are several places for the procedure to break down. The rpcinfo utility is an analog of ping that queries RPC servers and their registration with the portmapper. Like ping, rpcinfo provides a measure of basic connectivity, albeit at the session layer in the network protocol stack. Pinging a remote machine ensures that the underlying physical network and IP address handling are correct; using rpcinfo to perform a similar test verifies that the remote machine is capable of accepting and replying to an RPC request. rpcinfo can be used to detect and debug a variety of failures:
The output from rpcinfo shows the RPC program and version numbers, the protocols supported, the IP port used by the RPC server, and the name of the RPC service. Service names come from the rpc.bynumber NIS map; if no name is printed next to the registration information then the RPC program number does not appear in the map. This may be expected for third-party packages that run RPC server daemons, since the hardware vendor creating the /etc/rpc file doesn't necessarily list all of the software vendors' RPC numbers. However, a well-known RPC service should be listed properly. Missing RPC service names could indicate a corrupted or incomplete rpc.bynumber NIS map. One exception is the NFS ACL service, defined as RPC program 100227. Solaris does not list it in /etc/rpc, and therefore its name is not printed in the previous output. The NFS ACL service implements the protocol used between Solaris hosts to exchange ACL (Access Control List) information, though it is currently only interoperable between Solaris hosts. If the client or server do not implement the service, then traditional Unix file access control based on permission bits is used. If the portmapper on the remote machine has died or is not accepting connections for some reason, rpcinfo times out attempting to reach it and reports the error. This is a good first step toward diagnosing any RPC-related problem: verify that the remote portmapper is alive and returning valid RPC service registrations. rpcinfo can also be used like ping for a particular RPC server:% rpcinfo -p corvette program vers proto port service 100000 4 tcp 111 portmapper 100000 3 tcp 111 portmapper 100000 2 tcp 111 portmapper 100000 4 udp 111 portmapper 100000 3 udp 111 portmapper 100000 2 udp 111 portmapper 100024 1 udp 32781 status 100024 1 tcp 32775 status 100011 1 udp 32787 rquotad 100002 2 udp 32789 rusersd 100002 3 udp 32789 rusersd 100002 2 tcp 32777 rusersd 100002 3 tcp 32777 rusersd 100021 1 udp 4045 nlockmgr 100021 2 udp 4045 nlockmgr 100021 3 udp 4045 nlockmgr 100021 4 udp 4045 nlockmgr 100021 1 tcp 4045 nlockmgr 100021 2 tcp 4045 nlockmgr 100021 3 tcp 4045 nlockmgr 100021 4 tcp 4045 nlockmgr 100012 1 udp 32791 sprayd 100008 1 udp 32793 walld 100001 2 udp 32795 rstatd 100001 3 udp 32795 rstatd 100001 4 udp 32795 rstatd 100068 2 udp 32796 cmsd 100068 3 udp 32796 cmsd 100068 4 udp 32796 cmsd 100068 5 udp 32796 cmsd 100005 1 udp 32810 mountd 100005 2 udp 32810 mountd 100005 3 udp 32810 mountd 100005 1 tcp 32795 mountd 100005 2 tcp 32795 mountd 100005 3 tcp 32795 mountd 100003 2 udp 2049 nfs 100003 3 udp 2049 nfs 100227 2 udp 2049 100227 3 udp 2049 100003 2 tcp 2049 nfs 100003 3 tcp 2049 nfs 100227 2 tcp 2049 100227 3 tcp 2049
The -u or -t parameter specifies the transport protocol to be used -- UDP or TCP, respectively. The hostname must be specified, even if the local host is being queried. Finally, the RPC program and version number are given; the program may be supplied by name (one reported by rpcinfo -p) or by explicit numerical value. As a practical example, consider trying to mount an NFS filesystem from server mahimahi. You can mount it successfully, but attempts to operate on its files hang the client. You can use rpcinfo to check on the status of the NFS RPC daemons on mahimahi:rpcinfo -u host program version UDP-based services rpcinfo -t host program version TCP-based services
In this example, the NFS v2 RPC service is queried on remote host mahimahi. Since the service is specified by name, rpcinfo looks it up in the rpc NIS map. The -u flag tells rpcinfo to use the UDP protocol. If the -t option had been specified instead, rpcinfo would have reported the status of the NFS over TCP service. At the time of this writing, a handful of vendors still do not support NFS over TCP, therefore a -t query to one of their servers would report that rpcinfo could not find a registration for the service using such a protocol. rpcinfo -u and rpcinfo -t call the null procedure (procedure 0) of the RPC server. The null procedure normally does nothing more than return a zero-length reply. If you cannot contact the null procedure of a server, then the health of the server daemon process is suspect. If the daemon never started running, rpcinfo would have reported that it couldn't find the server daemon at all. If rpcinfo finds the RPC server daemon but can't get a null procedure reply from it, then the daemon is probably hung.% rpcinfo -u mahimahi nfs 2 program 100003 version 2 ready and waiting
13.3.3. Debugging RPC problemsIn the previous examples, we used rpcinfo to see if a particular service was registered or not. If the RPC service is not registered, or if you can't reach the RPC server daemon, it's likely there is a low-level problem in the network. However, sometimes you reach an RPC server, but you find the wrong one or it gives you the wrong answer. If you have a heterogeneous environment and are running multiple versions of each RPC service, it's possible to get RPC version number mismatch errors. These problems affect NIS and diskless client booting; they are best sorted out by using rpcinfo to emulate an RPC call and by observing server responses. Networks with multiple, heterogeneous servers may produce multiple, conflicting responses to the same broadcast request. Debugging problems that arise from this behavior often require knowing the order in which the responses are received. Here's an example: we'll perform a broadcast and then watch the order in which responses are received. When a diskless client boots, it may receive several replies to a request for boot parameters. The boot fails if the first reply contains incorrect or invalid boot parameter information. rpcinfo -b sends a broadcast request to the specified RPC program and version number. The RPC program can either be specified in numeric (100026) form, or in its name equivalent (bootparam):
In this example, a broadcast packet is sent to the boot parameter server (bootparam). rpcinfo obtains the RPC program number (100026) from /etc/rpc or the rpc.bynumber NIS map (depending on /etc/nsswitch.conf ). Any host that is running the boot parameter server replies to the broadcast with the standard null procedure "empty" reply. The universal address for the RPC service is printed by the requesting host in the order in which replies are received from these hosts (see the sidebar). After a short interval, another broadcast is sent.% rpcinfo -b bootparam 1 fe80::a00:20ff:feb5:1fba.128.67 unknown fe80::a00:20ff:feb9:2ad1.128.78 unknown 126.96.36.199.128.67 mora 188.8.131.52.128.68 kanawha 184.108.40.206.128.79 holydev Next Broadcast % rpcinfo -b bootparam 1 220.127.116.11.128.68 kanawha fe80::a00:20ff:feb5:1fba.128.67 unknown 18.104.22.168.128.67 mora fe80::a00:20ff:feb9:2ad1.128.78 unknown 22.214.171.124.128.79 holydev Next Broadcast
Server loading may cause the order of replies between successive broadcasts to vary significantly. A busy server takes longer to schedule the RPC server and process the request. Differing reply sequences from RPC servers are not themselves indicative of a problem, if the servers all return the correct information. If one or more servers has incorrect information, though, you will see irregular failures. A machine returning correct information may not always be the first to deliver a response to a client broadcast, so sometimes the client gets the wrong response. In the last example (diskless client booting), a client that gets the wrong response won't boot. The boot failures may be very intermittent due to variations in server loading: when the server returning an invalid reply is heavily loaded, the client will boot without problem. However, when the servers with the correct information are loaded, then the client gets an invalid set of boot parameters and cannot start booting a kernel. Binding to the wrong NIS server causes another kind of problem. A renegade NIS server may be the first to answer a ypbind broadcast for NIS service, and its lack of information about the domain makes the client machine unusable. Sometimes, just looking at the list of servers that respond to a request may flag a problem, if you notice that one of the servers should not be answering the broadcast:
In this example, all NIS servers on the local network answer the rpcinfo broadcast request to the null procedure of the ypserv daemon. If poi should not be an NIS server, then the network will be prone to periods of intermittent failure if clients bind to it. Failure to fully decommission a host as an NIS server -- leaving empty NIS map directories, for example -- may cause this problem. There's another possibility for NIS failure that rpcinfo cannot detect: there may be NIS servers on the network, but no servers for the client's NIS domain. In the previous example, poi may be a valid NIS server in another domain, in which case it is operating properly by responding to the rpcinfo broadcast. You might not be able to get ypbind started on an NIS client because all of the servers are in the wrong domain, and therefore the client's broadcasts are not answered. The rpcinfo -b test is a little misleading because it doesn't ask the NIS RPC daemons what domains they are serving, although the client's requests will be domain-specific. Check the servers that reply to an rpcinfo -b and ensure that they serve the NIS domain used by the clients experiencing NIS failures. If a client cannot find an NIS server, ypbind hangs the boot sequence with errors of the form:% rpcinfo -b ypserv 1 126.96.36.199.3.255 poi 188.8.131.52.3.166 onaga 184.108.40.206.3.163 mahimahi
Using rpcinfo as shown helps to determine why a particular client cannot start the NIS service: if no host replies to the rpcinfo request, then the broadcast packet is failing to reach any NIS servers. If the NIS domain name and the broadcast address are correct, then it may be necessary to override the broadcast-based search and hand ypbind the name and address of a valid NIS server. Tools for examining and altering NIS bindings are the subject of the next section.WARNING: Timed out waiting for NIS to come up
Copyright © 2002 O'Reilly & Associates. All rights reserved.