14.2. Troubleshooting Tools and TechniquesWe went over nslookup, dig, and the name server's debugging output in the last two chapters. Before we go on, let's introduce some new tools that can be useful in troubleshooting: named-xfer, name server database dumps, and query logging.
14.2.1. How to Use named-xfernamed-xfer is the program that BIND 4 and 8 name servers start to perform zone transfers. (BIND 9 name servers, you'll remember, are multithreaded, so they don't need a separate program to do inbound zone transfers: they just start a new thread.) named-xfer checks whether the slave's copy of the zone data is up to date and transfers a new zone if necessary. (In Versions 4.9 and 8, named checks if a zone is up to date first, to avoid starting up a child process when no transfer is necessary.)
In Chapter 13, "Reading BIND Debugging Output", we showed you the debugging output a BIND 8 slave name server logged as it checked its zone. When the slave server transferred the zone, it started a child process (named-xfer) to pull the data to the local filesystem. We didn't tell you, however, that you can also start named-xfer manually instead of waiting for named to start it, and that you can tell it to produce debugging output independently of named.
This can be useful if you're tracking down a problem with zone transfers but don't want to wait for named to schedule one. To test a zone transfer manually, you need to specify a number of command-line options:
This is the output from a BIND 8.2.3 version of named-xfer. Earlier versions of named-xfer won't have all of these options.% /usr/sbin/named-xfer Usage error: no domain Usage: named-xfer -z zone_to_transfer -f db_file [-i ixfr_file] [-s serial_no] [-d debug_level] [-l debug_log_file] [-t trace_file] [-p port] [-S] [-Z] [-C class] [-x axfr-src] [-T tsig_info_file] servers [-ixfr|-axfr]...
When named starts named-xfer, it specifies the -z option (the zone named wants to check), the -f option (the name of the zone data file that corresponds to the zone, from named.boot or named.conf ), the -s option (the zone's serial number on the slave from the current SOA record), and the addresses of the servers the slave was instructed to load from (the IP addresses from the masters substatement in the zone statement in named.conf, or from the secondary directive in named.boot). If named is running in debug mode, it also specifies the debug level for named-xfer with the -d option. The other options aren't usually necessary to troubleshoot problems; they have to do with incremental zone transfers, TSIG signing zone transfers, and such.
When you run named-xfer manually, you can also specify the debug level on the command line with -d. (Don't forget, though, that debug levels above 3 will produce tons of debugging output if the transfer succeeds!) You can also specify an alternate filename for the debug file with the -l option. The default log file is /var/tmp/xfer.ddt.XXXXXX, where XXXXXX is a suffix appended to preserve uniqueness or a file by the same name in /usr/tmp. And you can specify the name of the host to load from instead of its IP address.
For example, with the following command line, you can see whether zone transfers from terminator.movie.edu are working:
In this command, we specified a serial number of zero because we wanted to force named-xfer to attempt a zone transfer even if it wasn't needed. Zero is a special serial number -- named-xfer will transfer the zone regardless of the actual zone serial number. Also, we told named-xfer to put the new zone data file in /tmp rather than overwriting the zone's working zone data file.% /usr/sbin/named-xfer -z movie.edu -f /tmp/db.movie -s 0 terminator % echo $? 4
We can tell if the transfer succeeded by looking at named-xfer 's return value. If you're running BIND Version 8.1.2 or older, your named-xfer has four possible return values:
Note that BIND 8.2 and later named-xfer s don't use return value 1 anymore. Return value 1 has been replaced by return values 4 through 6.
14.2.2. What If I Don't Have named-xfer?If you've upgraded to BIND 9 and don't have a named-xfer binary, you can still use nslookup or dig to do a zone transfer. Either query tool will give you some of the information that named-xfer would have given you.
For example, to use dig to do the same zone transfer we showed you earlier, you could run:
With nslookup, you could change your name server and use the ls -d command from interactive mode.% dig @terminator.movie.edu movie.edu axfr
Unfortunately, both dig and nslookup are more subtle than named-xfer is in reporting errors. If nslookup can't transfer a zone, it usually reports an "unspecified error":
This could be caused by an allow-transfer access list, the fact that terminator.movie.edu isn't actually authoritative for movie.edu, or a number of other problems. To tell which, you may just have to send other, related queries or check the syslog output on the master name server.> ls movie.edu [terminator.movie.edu] *** Can't list domain movie.edu: Unspecified error
14.2.3. How to Read a Database DumpPoring over a dump of the name server's internal database -- including cached information -- can also help you track down problems. The ndc dumpdb or rndc dumpdb command causes named to dump its authoritative data, cached data, and hints data to named_dump.db in BIND's working directory (or in /usr/tmp/named_dump.db or /var/tmp/named_dump.db, for BIND 4). An example of a named_dump.db file follows. The authoritative data and cached entries, mixed together, appear first in the file. At the end of the file is the hints data:
The name server that created this named_dump.db file was authoritative only for 0.0.127.in-addr.arpa. Only two names have been looked up by this server: galt.cs.purdue.edu and cujo.movie.edu. In the process of looking up galt.cs.purdue.edu, this server cached not only the address of galt, but also the list of name servers for purdue.edu and the addresses for those servers. The name cujo.movie.edu, however, doesn't really exist (nor does the zone movie.edu, except in our examples), so the server cached the negative response. In the dump file, the negative response is commented out (the line starts with a semicolon), and the reason is listed (NXDOMAIN) instead of real data. You'll notice the TTL is quite low (593). On BIND 8.2 and later name servers, negative responses are cached according to the last field in the SOA record, which is usually much smaller than the default TTL for the zone.; Dumped at Tue Jan 6 10:49:08 1998 ;; ++zone table++ ; 0.0.127.in-addr.arpa (type 1, class 1, source db.127.0.0) ; time=0, lastupdate=0, serial=1, ; refresh=0, retry=3600, expire=608400, minimum=86400 ; ftime=884015430, xaddr=[0.0.0.0], state=0041, pid=0 ;; --zone table-- ; Note: Cr=(auth,answer,addtnl,cache) tag only shown for non-auth RR's ; Note: NT=milliseconds for any A RR which we've used as a nameserver ; --- Cache & Data --- $ORIGIN . . 518375 IN NS G.ROOT-SERVERS.NET. ;Cr=auth [220.127.116.11] 518375 IN NS J.ROOT-SERVERS.NET. ;Cr=auth [18.104.22.168] 518375 IN NS K.ROOT-SERVERS.NET. ;Cr=auth [22.214.171.124] 518375 IN NS L.ROOT-SERVERS.NET. ;Cr=auth [126.96.36.199] 518375 IN NS M.ROOT-SERVERS.NET. ;Cr=auth [188.8.131.52] 518375 IN NS A.ROOT-SERVERS.NET. ;Cr=auth [184.108.40.206] 518375 IN NS H.ROOT-SERVERS.NET. ;Cr=auth [220.127.116.11] 518375 IN NS B.ROOT-SERVERS.NET. ;Cr=auth [18.104.22.168] 518375 IN NS C.ROOT-SERVERS.NET. ;Cr=auth [22.214.171.124] 518375 IN NS D.ROOT-SERVERS.NET. ;Cr=auth [126.96.36.199] 518375 IN NS E.ROOT-SERVERS.NET. ;Cr=auth [188.8.131.52] 518375 IN NS I.ROOT-SERVERS.NET. ;Cr=auth [184.108.40.206] 518375 IN NS F.ROOT-SERVERS.NET. ;Cr=auth [220.127.116.11] EDU 86393 IN SOA A.ROOT-SERVERS.NET. hostmaster.INTERNIC.NET. ( 1998010500 1800 900 604800 86400 ) ;Cr=addtnl [18.104.22.168] $ORIGIN 0.127.in-addr.arpa. 0 IN SOA cujo.movie.edu. root.cujo.movie.edu. ( 1998010600 10800 3600 608400 86400 ) ;Cl=5 IN NS cujo.movie.edu. ;Cl=5 $ORIGIN 0.0.127.in-addr.arpa. 1 IN PTR localhost. ;Cl=5 $ORIGIN EDU. PURDUE 172787 IN NS NS.PURDUE.EDU. ;Cr=addtnl [22.214.171.124] 172787 IN NS MOE.RICE.EDU. ;Cr=addtnl [126.96.36.199] 172787 IN NS PENDRAGON.CS.PURDUE.EDU. ;Cr=addtnl [188.8.131.52] 172787 IN NS HARBOR.ECN.PURDUE.EDU. ;Cr=addtnl [184.108.40.206] $ORIGIN movie.EDU. ;cujo 593 IN SOA A.ROOT-SERVERS.NET. hostmaster.INTERNIC. NET. ( ; 1998010500 1800 900 604800 86400 );EDU.; NXDOMAIN ;-$ ;Cr=auth [220.127.116.11] $ORIGIN RICE.EDU. MOE 172787 IN A 18.104.22.168 ;NT=84 Cr=addtnl [22.214.171.124] $ORIGIN PURDUE.EDU. CS 86387 IN NS pendragon.cs.PURDUE.edu. ;Cr=addtnl [126.96.36.199] 86387 IN NS ns.PURDUE.edu. ;Cr=addtnl [188.8.131.52] 86387 IN NS harbor.ecn.PURDUE.edu. ;Cr=addtnl [184.108.40.206] 86387 IN NS moe.rice.edu. ;Cr=addtnl [220.127.116.11] NS 172787 IN A 18.104.22.168 ;NT=4 Cr=addtnl [22.214.171.124] $ORIGIN ECN.PURDUE.EDU. HARBOR 172787 IN A 126.96.36.199 ;NT=6 Cr=addtnl [188.8.131.52] $ORIGIN CS.PURDUE.EDU. galt 86387 IN A 184.108.40.206 ;Cr=auth [220.127.116.11] PENDRAGON 172787 IN A 18.104.22.168 ;NT=20 Cr=addtnl [22.214.171.124] $ORIGIN ROOT-SERVERS.NET. K 604775 IN A 126.96.36.199 ;NT=10 Cr=answer [188.8.131.52] A 604775 IN A 184.108.40.206 ;NT=20 Cr=answer [220.127.116.11] L 604775 IN A 18.104.22.168 ;NT=8 Cr=answer [22.214.171.124] B 604775 IN A 126.96.36.199 ;NT=9 Cr=answer [188.8.131.52] M 604775 IN A 184.108.40.206 ;NT=20 Cr=answer [220.127.116.11] C 604775 IN A 18.104.22.168 ;NT=17 Cr=answer [22.214.171.124] D 604775 IN A 126.96.36.199 ;NT=11 Cr=answer [188.8.131.52] E 604775 IN A 184.108.40.206 ;NT=9 Cr=answer [220.127.116.11] F 604775 IN A 18.104.22.168 ;NT=73 Cr=answer [22.214.171.124] G 604775 IN A 126.96.36.199 ;NT=14 Cr=answer [188.8.131.52] H 604775 IN A 184.108.40.206 ;NT=160 Cr=answer [220.127.116.11] I 604775 IN A 18.104.22.168 ;NT=102 Cr=answer [22.214.171.124] J 604775 IN A 126.96.36.199 ;NT=21 Cr=answer [188.8.131.52] ; --- Hints --- $ORIGIN . . 3600 IN NS A.ROOT-SERVERS.NET. ;Cl=0 3600 IN NS B.ROOT-SERVERS.NET. ;Cl=0 3600 IN NS C.ROOT-SERVERS.NET. ;Cl=0 3600 IN NS D.ROOT-SERVERS.NET. ;Cl=0 3600 IN NS E.ROOT-SERVERS.NET. ;Cl=0 3600 IN NS F.ROOT-SERVERS.NET. ;Cl=0 3600 IN NS G.ROOT-SERVERS.NET. ;Cl=0 3600 IN NS H.ROOT-SERVERS.NET. ;Cl=0 3600 IN NS I.ROOT-SERVERS.NET. ;Cl=0 3600 IN NS J.ROOT-SERVERS.NET. ;Cl=0 3600 IN NS K.ROOT-SERVERS.NET. ;Cl=0 3600 IN NS L.ROOT-SERVERS.NET. ;Cl=0 3600 IN NS M.ROOT-SERVERS.NET. ;Cl=0 $ORIGIN ROOT-SERVERS.NET. K 3600 IN A 184.108.40.206 ;NT=11 Cl=0 L 3600 IN A 220.127.116.11 ;NT=9 Cl=0 A 3600 IN A 18.104.22.168 ;NT=10 Cl=0 M 3600 IN A 22.214.171.124 ;NT=11 Cl=0 B 3600 IN A 126.96.36.199 ;NT=1288 Cl=0 C 3600 IN A 188.8.131.52 ;NT=21 Cl=0 D 3600 IN A 184.108.40.206 ;NT=1288 Cl=0 E 3600 IN A 220.127.116.11 ;NT=19 Cl=0 F 3600 IN A 18.104.22.168 ;NT=23 Cl=0 G 3600 IN A 22.214.171.124 ;NT=18 Cl=0 H 3600 IN A 126.96.36.199 ;NT=11 Cl=0 I 3600 IN A 188.8.131.52 ;NT=21 Cl=0 J 3600 IN A 184.108.40.206 ;NT=13 Cl=0
The hints section at the bottom of the file contains the data from the db.cache file. The TTL of the hints data is decremented, and it may go to zero, but the hints are never discarded.
Note that some of the resource records are followed by a semicolon and NT=. You will only see these on the address records of name servers. The number is the roundtrip time calculation that the name server keeps so that it knows which name servers have responded most quickly in the past; the name server with the lowest roundtrip time will be tried first the next time.
The cached data is easy to pick out -- those entries have a credibility tag (Cr =) and (sometimes) the IP address of the server the data came from. The zone data and hint data are tagged with Cl=, which is just a count of the level in the domain tree (the root is level 0, foo would be level 1, foo.foo would be level 2, etc.). Let's digress a moment to explain the concept of credibility.
The name server prints the IP address of the remote name server if it's available. On BIND 8.2 and later name servers, the IP address is available only if you've turned on host-statistics, which we introduced in Chapter 8, "Growing Your Domain". On earlier BIND 4.9 and BIND 8 name servers, it's on by default. host-statistics keeps impressive statistics on every name server and resolver you've ever communicated with, which is very useful for some purposes (like figuring out which name server your server got a record from), but consumes a fair amount of memory.One of the advances between Versions 4.8.3 and 4.9 was the addition of a credibility measure. This allows a name server to make more intelligent decisions about what to do with new data from a remote server.
A 4.8.3 name server had only two credibility levels -- locally authoritative data and everything else. The locally authoritative data was data from your zone data files -- your name server knew better than to update its internal copy of what came from your zone file. But all data from remote name servers was considered equal.
Here is a situation that could happen and the way a 4.8.3 server would deal with it. Suppose that your server looked up an address for terminator.movie.edu and received an authoritative answer from the movie.edu name server. (Remember, an authoritative answer is the best you can get.) Sometime later, while looking up foo.oreilly.com, your server receives another address record for terminator.movie.edu, but this time as part of the delegation info for oreilly.com (which terminator.movie.edu is a slave for). The 4.8.3 name server would update the cached address record for terminator.movie.edu, even though the data came from the com name server instead of the authoritative movie.edu name server. Of course, the com and movie.edu name servers will have exactly the same data for terminator.movie.edu, so this won't be a problem, right? Yeah, and it never rains in southern California, either.
A 4.9 or newer name server is more intelligent. Like a 4.8.3 name server, it still considers your zone data unassailable -- beyond any doubt. But a 4.9 or newer name server distinguishes among the different data from remote name servers. Here is the hierarchy of remote data credibility from most credible to least:
In the situation just described, a 4.9 or newer name server would not replace the authoritative data (credibility = auth) for terminator.movie.edu with the delegation data (credibility = addtnl ) because the authoritative answer has higher credibility.
14.2.4. Logging QueriesBIND Version 4.9 added a feature called query logging that can be used to help diagnose certain problems. When query logging is turned on, a running name server will log every query to syslog. This feature could help you find resolver configuration errors because you can verify that the name you think is being looked up really is the name being looked up.
First you must make sure that LOG_INFO messages are being logged by syslog for the facility daemon. Next, you need to turn on query logging. This can be done in several ways: for BIND 4.9, setoptions query-log in your name server boot file; for BIND 4.9 or BIND 8, start the name server with -q on the command line or send an ndc querylog command to a running name server. For BIND 9.1.0 or later (earlier versions don't support query logging), use rndc querylog. You'll start seeing syslog messages like this:
Or, if you're running BIND 9, like this:Feb 20 21:43:25 terminator named: XX+ /220.127.116.11/carrie.movie.edu/A Feb 20 21:43:32 terminator named: XX+ /18.104.22.168/22.214.171.124.in-addr.arpa/PTR
These messages include the IP address of the host that made the query, as well as the query itself. Since the first example comes from a BIND 8.2.3 name server and these queries are recursive, they begin with XX+. Iterative queries begin with just XX. (Name servers older than BIND 8.2.1 don't distinguish recursive from nonrecursive queries.) Inverse queries have a dash before the query type (e.g., an inverse query for an address record is logged as "-A" instead of just "A"). After enough queries have been logged, you can turn off query logging by sending another ndc querylog or rndc querylog command to your name server.Jan 13 18:32:25 terminator named: info: client 126.96.36.199#1702: query: carrie.movie.edu IN A Jan 13 18:32:42 terminator named: info: client 188.8.131.52#1702: query: 184.108.40.206.in-addr.arpa IN PTR
Copyright © 2002 O'Reilly & Associates. All rights reserved.