home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


DNS & BIND

DNS & BINDSearch this book
Previous: 13.2 Troubleshooting Tools and Techniques Chapter 13
Troubleshooting DNS and BIND
Next: 13.4 Transition Problems
 

13.3 Potential Problem List

Now that we've given you a nice set of tools, let's talk about how you can use them to diagnose real problems. There are some problems that are easy to recognize and correct. We should cover these as a matter of course - they're some of the most common problems because they're caused by some of the most common mistakes. Here are the contestants, in no particular order. We call 'em our "Unlucky Thirteen."

13.3.1 1. Forgot to Increment Serial Number

The main symptom of this problem is that slave name servers don't pick up any changes you make to the zone's db file on the primary. The slaves think the zone data hasn't changed, since the serial number is still the same.

How do you check whether or not you remembered to increment the serial number? Unfortunately, that's not so easy. If you don't remember what the old serial number was, and your serial number gives you no indication of when it was updated, there's no direct way to tell whether it's changed.[1] When you signal the primary, it will load the updated zone file regardless of whether you've changed the serial number. It will check the file's timestamp, see that it's been modified since it last loaded the data, and read the file. About the best you can do is to use nslookup to compare the data returned by the primary and by a slave. If they return different data, you probably forgot to increment the serial number. If you can remember a recent change you made, you can look for that data. If you can't remember a recent change, you could try transferring the zone from a primary and from a slave, sorting the results, and using diff to compare them.

[1] On the other hand, if you encode the date into the serial number, as many people do (e.g., 1998010500 is the first rev of data on January 5, 1998), you may be able to tell at a glance whether you updated the serial number when you made the change.

The good news is that, although determining whether the zone was transferred is tricky, making sure the zone is transferred is simple. Just increment the serial number on the primary's copy of the db file and signal the primary to reload. The slaves should pick up the new data within their refresh interval, or sooner if they use NOTIFY . If you want to make sure the slaves can transfer the new data, you can execute named-xfer by hand (on the slaves, naturally):

# 

/etc/named-xfer -z movie.edu -f db.movie -s 0 terminator


# 

echo $?

If named-xfer returns 1, the zone was transferred successfully. Other return values indicate that no zone was transferred, either because of an error or because the slave thought the zone was up-to-date. (See Section 13.2.1, "How to Use named-xfer ," earlier in this chapter, for more details.)

There's another variation on the "forgot to increment the serial number" line. We see it in environments where administrators use tools like h2n to create db files from the host table. With scripts like h2n , it's temptingly easy to delete old db files and create new ones from scratch. Some administrators do this occasionally because they mistakenly believe that data in the old db files can creep into the new ones. The problem with deleting the db files is that, without the old db file to read for the current serial number, h2n starts over at serial number 1. If your primary's serial number rolls all the way back to 1 from 598 or what-have-you, the slaves (versions 4.8.3 and earlier) don't complain; they just figure they're all caught up and don't need zone transfers. A 4.9 or later slave server, however, is ever watchful, and will emit a syslog error message warning you that something might be wrong:

Jun  7 20:14:26 wormhole named[29618]: Zone "movie.edu"
                (class 1) 
SOA
 serial# (1) rcvd from [192.249.249.3]
                is < ours (112)

So if the serial number on the primary looks suspiciously low, check the serial number on the slaves, too, and compare them:

% 

nslookup


Default Server:  terminator.movie.edu
Address:  192.249.249.3

> 

set q=soa


> 

movie.edu.


Server:  terminator.movie.edu
Address:  192.249.249.3

movie.edu
        origin = terminator.movie.edu
        mail addr = al.robocop.movie.edu
        serial = 1
        refresh = 10800 (3 hours)
        retry   = 3600 (1 hour)
        expire  = 604800 (7 days)
        minimum ttl = 86400 (1 day)
> 

server wormhole.movie.edu.


Default Server:  wormhole.movie.edu
Addresses:  192.249.249.1, 192.253.253.1

> 

movie.edu.


Server:  wormhole.movie.edu
Addresses:  192.249.249.1, 192.253.253.1

movie.edu
        origin = terminator.movie.edu
        mail addr = al.robocop.movie.edu
        serial = 112
        refresh = 10800 (3 hours)
        retry   = 3600 (1 hour)
        expire  = 604800 (7 days)
        minimum ttl = 86400 (1 day)

wormhole , as a movie.edu slave, should never have a larger serial number than the primary master, so clearly something's amiss.

This problem is really easy to spot, by the way, with the tool we'll write in Chapter 14, Programming with the Resolver and Name Server Library Routines , coming up next.

13.3.2 2. Forgot to Signal Primary Master Server

Occasionally, you may forget to signal your primary master name server after making a change to the conf file or to the db file. The name server won't know to load the new data - it doesn't automatically check the timestamp of the file and notice that it changed. Consequently, any changes you've made won't be reflected in the name server's data: new zones won't be loaded, and new records won't percolate out to the slaves.

To check when you last signaled the name server to reload, scan the syslog output for the last entry like this:

Mar  8 17:22:08 terminator named[22317]: reloading nameserver

This is the last time you sent a HUP signal to the name server. If you killed and then restarted the name server, you'll see an entry like this:

Mar  8 17:22:08 terminator named[22317]: restarted

or, on a 4.9 name server:

Mar  8 17:22:08 terminator named[22317]: starting

If the time of the restart doesn't correlate with the time you made the last change, signal the name server to reload its data again. And check that you incremented the serial numbers on db files you changed, too.

13.3.3 3. Slave Server Can't Load Zone Data

If a slave name server can't get the current serial number for a zone from its master name server, it'll log a message like the following via syslog :

Jan  6 11:55:25 wormhole named[544]: Err/
TO
 getting serial# for "movie.edu"

On a BIND 4 name server, that looks like this:

Mar  3 8:19:34 wormhole named[22261]: zoneref: Masters for secondary
       zone movie.edu unreachable

If you let this problem fester, the slave will expire the zone:

Mar  8 17:12:43 wormhole named[22261]: secondary zone
       "movie.edu" expired

Once the zone has expired, you'll start getting SERVFAIL errors when you query the name server for data in the zone:

% 

nslookup robocop wormhole.movie.edu.


Server:  wormhole.movie.edu
Addresses:  192.249.249.1, 192.253.253.1

*** wormhole.movie.edu can't find robocop.movie.edu: Server failed

There are three leading causes of this problem: a loss in connectivity to the master server due to network failure, an incorrect IP address for the master server in the conf file, and a syntax error in the zone data file on the master server. First check the conf file's entry for the zone and see what IP address the slave is attempting to load from:

zone "movie.edu" {
                type slave;
                file "db.movie";
                masters { 192.249.249.3; };
};

On a BIND 4 server, the directive would look like this:

secondary        movie.edu        192.249.249.3        db.movie

Make sure that's really the IP address of the master name server. If it is, check connectivity to that IP address:

% 

ping 192.249.249.3 -n 10



PING
 192.249.249.3: 64 byte packets

----192.249.249.3 
PING
 Statistics----
10 packets transmitted, 0 packets received, 100% packet loss

If the master server isn't reachable, make sure that the server's host is really running (e.g., is powered on, etc.), or look for a network problem. If the server is reachable, make sure named is running on the host, and that you can manually transfer the zone:

# 

named-xfer -z movie.edu -f /tmp/db.movie -s 0 192.249.249.3


# 

echo $?


2

A return code of 2 means that an error occurred. Check to see if there is a syslog message. In this case there was a message:

Jan  6 14:56:07 zardoz named-xfer[695]: record too short from [192.249.249.3], zone movie.edu

At first glance, this error looks like a truncation problem. The real problem is easier to see if you use nslookup :

% 

nslookup - terminator.movie.edu


Default Server:  terminator.movie.edu
Address:  192.249.249.3

> 

ls movie.edu

                  
--This attempts a zone transfer

[terminator.movie.edu]
*** Can't list domain movie.edu: Query refused

What has happened here is that named is refusing to allow you to transfer its zone data. The remote server has secured its zone data with the allow-transfer substatement, the secure_zone resource record, or xfrnets boot file directive.

If the master server is responding as not authoritative for the zone, you'll see a message like this:

Jan  6 11:58:36 zardoz named[544]: Err/
TO
 getting serial# for "movie.edu"
Jan  6 11:58:36 zardoz named-xfer[793]: [192.249.249.3] not authoritative for
     movie.edu, 
SOA
 query got rcode 0, aa 0, ancount 0, aucount 0

If this is the correct master server, the server should be authoritative for the zone. This probably indicates that the master had a problem loading the zone, usually because of a syntax error in the zone data file. Contact the administrator of the master server and have him check his syslog output for indications of a syntax error (see problem 5, later in this chapter).

13.3.4 4. Added Name to Database File, but Forgot to Add PTR Record

Because the mappings from host names to IP addresses are disjointed from the mappings from IP addresses to host names in DNS , it's easy to forget to add a PTR record for a new host. Adding the A record is intuitive, but many people who are used to host tables assume that adding an address record takes care of the reverse mapping, too. That's not true - you need to add a PTR record for the host to the appropriate in-addr.arpa domain.

Forgetting to add the PTR record for a host usually causes that host to fail authentication checks. For example, users on the host won't be able to rlogin to other hosts without specifying a password, and rsh or rcp to other hosts simply won't work. The servers these commands talk to need to be able to map the connection's IP address to a domain name to check .rhosts and hosts.equiv . These users' connections will cause entries like this to be syslog ged:

Aug 15 17:32:36 terminator inetd[23194]: login/tcp:
       Connection from unknown (192.249.249.23)

Also, many large ftp archives, including ftp.uu.net , refuse anonymous ftp access to hosts whose IP addresses don't map back to domain names. ftp.uu.net 's ftp server emits a message that reads, in part:

530- Sorry, we're unable to map your 
IP
 address 140.186.66.1 to a hostname
530- in the 
DNS
.  This is probably because your nameserver does not have a
530- 
PTR
 record for your address in its tables, or because your reverse
530- nameservers are not registered.  We refuse service to hosts whose
530- names we cannot resolve.

That makes the reason you can't use anonymous ftp pretty evident. Other ftp sites, however, don't bother printing informative messages; they simply deny service.

nslookup is handy for checking whether you've forgotten the PTR record or not:

% 

nslookup


Default Server:  terminator.movie.edu
Address:  192.249.249.3

> 

beetlejuice

      
--Check for a hostname-to-address mapping

Server:  terminator.movie.edu
Address:  192.249.249.3

Name:    beetlejuice.movie.edu
Address:  192.249.249.23

> 

192.249.249.23

  
--Now check for a corresponding address-to-hostname mapping

Server:  terminator.movie.edu
Address:  192.249.249.3

*** terminator.movie.edu can't find 192.249.249.23: Non-existent domain

On the primary for 249.249.192.in-addr.arpa , a quick check of the db.192.249.249 file will tell you if the PTR record hasn't been added to the db file yet, or if the name server hasn't been signaled to load the file. If the name server having trouble is a slave for the zone, check that the serial number was incremented on the primary and that the slave has had enough time to load the zone.

13.3.5 5. Syntax Error in the Conf File or DNS Database File

Syntax errors in the conf file and in zone database files are also relatively common (more or less, depending on the experience of the administrator). Generally, an error in the conf file will cause the name server to fail to load one or more zones. Some typos in the options statement will cause the name server to fail to start at all, and to log an error like this via syslog :

Jan  6 11:59:29 terminator named[544]: can't change directory to /var/name: No
     such file or directory

Note that you won't see an error message when you try to start named on the command line, but named won't stay running for long.

If the syntax error is in a less important line in the boot file - say, in zone statement - only that zone will be affected. Usually, the name server will not be able to load the zone at all (say, you misspell "master" or the name of the data file, or you forget to put quotes around the file name or domain name). This would produce syslog output like:

Jan  6 12:01:36 terminator named[841]: /etc/named.conf:10: syntax error near
     'movie.edu'

If a db file contains a syntax error, yet the name server succeeds in loading the zone, it will either answer as "non-authoritative" for all data in the zone or will return a SERVFAIL error for lookups in the zone:

% 

nslookup carrie


Server:  terminator.movie.edu
Address:  192.249.249.3

Non-authoritative answer:
Name:    carrie.movie.edu
Address:  192.253.253.4

Here's the syslog message produced by the syntax error that caused this problem:

Jan  6 15:07:46 huskymo named[693]: db.movie:11: Priority error
     (postmanrings2x.movie.edu.)
Jan  6 15:07:46 huskymo named[693]: master zone "movie.edu" (
IN
) rejected due
     to errors (serial 1997010600)
Jan  6 15:07:46 huskymo named[693]: slave zone "movie.edu" (
IN
) removed

If you looked in the db file for the problem, you'd find this record:

postmanrings2x     IN     MX     postmanrings2x.movie.edu.

The MX record is missing the preference field, which causes the error.

Note that unless you correlate the lack of authority (when you expect the name server to be authoritative) with a problem, or scan your syslog file assiduously, you might never notice the syntax error!

Starting with BIND 4.9.4, an "invalid" host name can be a syntax error:

Jan  6 12:04:10 terminator named[841]: owner name "
ID
_4.movie.edu" 
IN
 (primary)
     is invalid - rejecting
Jan  6 12:04:10 terminator named[841]: db.movie:11: owner name error
Jan  6 12:04:10 terminator named[841]: db.movie:11: Database error (a)
Jan  6 12:04:10 terminator named[841]: master zone "movie.edu" (
IN
) rejected
     due to errors (serial 1997010600)

13.3.6 6. Missing Dot at the End of a Name in a DNS Database File

It's very easy to leave off trailing dots when editing a db file. Since the rules for when to use them change so often ( don't use them in the boot file, don't use them in resolv.conf , do use them in db files to override $ ORIGIN ...), it's hard to keep them straight. These resource records:

zorba         IN     MX     10 zelig.movie.edu
movie.edu     IN     NS     terminator.movie.edu

really don't look that odd to the untrained eye, but they probably don't do what they're intended to. In the db.movie file, they'd be equivalent to:

zorba.movie.edu.        IN    MX    10 zelig.movie.edu.movie.edu.
movie.edu.movie.edu.    IN    NS    terminator.movie.edu.movie.edu.

unless the origin were explicitly changed.

If you omit a trailing dot after a domain name in the resource record's data (as opposed to leaving off a trailing dot in the resource record's name ), you usually end up with wacky NS or MX records:

% 

nslookup -type=mx zorba.movie.edu.


Server:  terminator.movie.edu
Address:  192.249.249.3

zorba.movie.edu      preference = 10, mail exchanger
                     = zelig.movie.edu.movie.edu
zorba.movie.edu      preference = 50, mail exchanger
                     = postmanrings2x.movie.edu.movie.edu

The cause of this should be fairly clear from the nslookup output. But if you forget the trailing dot on the domain name field in a record (as in the movie.edu NS record above), spotting your mistake might not be as easy. If you try to look up the record with nslookup , you won't find it under the name you thought you used. Dumping your name server's database may help you root it out:

$
ORIGIN
 edu.movie.edu.
movie    
IN
    
NS
    terminator.movie.edu.movie.edu.

The $ORIGIN line looks odd enough to stand out.

13.3.7 7. Missing Cache Data

If, for some reason, you forget to install a cache file on your host, or if you accidentally delete it, your name server will be unable to resolve names outside of its authoritative data. This behavior is easy to recognize using nslookup , but be careful to use full, dot-terminated domain names, or else the search list may cause misleading failures.

% 

nslookup


Default Server:  terminator.movie.edu
Address:  192.249.249.3

> 

ftp.uu.net.

    
 - A lookup of a name outside your name server's authoritative data

                                        
 - causes a SERVFAIL error...

Server:  terminator.movie.edu
Address:  192.249.249.3

*** terminator.movie.edu can't find ftp.uu.net.: Server failed

A lookup of a name in your name server's authoritative data returns a response:

> 

wormhole.movie.edu.


Server:  terminator.movie.edu
Address:  192.249.249.3

Name:    wormhole.movie.edu
Addresses:  192.249.249.1, 192.253.253.1

> 

^D

To confirm your suspicion that the cache data are missing, check the syslog output for an error like this:

 Jan  6 15:10:22 terminator named[764]: No root nameservers for class 
IN

Class 1, you'll remember, is the IN , or Internet, class. This error indicates that because no cache data were available, no root name servers were found.

13.3.8 8. Loss of Network Connectivity

Though the Internet is more reliable today than it was back in the wild and woolly days of the ARPANET , network outages are still relatively common. Without "lifting the hood" and poking around in debugging output, these failures usually look like poor performance:

% 

nslookup nisc.sri.com.


Server:  terminator.movie.edu
Address:  192.249.249.3

*** Request to terminator.movie.edu timed out ***

If you turn on name server debugging, though, you'll see that your name server, anyway, is healthy. It received the query from the resolver, sent the necessary queries, and waited patiently for a response. It just didn't get one. Here's what the debugging output might look like:

Debug turned 
ON
, Level 1

Here nslookup sends the first query to our local name server, for the IP address of nisc.sri.com . You can tell it's not another name server because the query is received from a port other than 53, the name server's port. Notice that the query is forwarded to another name server, and when no answer is received, it is resent to a different name server:

datagram from [192.249.249.3].1051, fd 5, len 30
req: nlookup(nisc.sri.com) id 18470 type=1 class=1
req: missed 'nisc.sri.com' as 'com' (cname=0)
forw: forw -> [198.41.0.4].53 ds=7 nsid=58732 id=18470 0ms retry 4 sec
resend(addr=1 n=0) -> [128.9.0.107].53 ds=7 nsid=58732 id=18470 0ms

Now nslookup is getting impatient, and it queries our local name server again. Notice that it uses the same port. The local name server ignores the duplicate query and tries forwarding the query two more times:

datagram from [192.249.249.3].1051, fd 5, len 30
req: nlookup(nisc.sri.com) id 18470 type=1 class=1
req: missed 'nisc.sri.com' as 'com' (cname=0)
resend(addr=2 n=0) -> [192.33.4.12].53 ds=7 nsid=58732 id=18470 0ms
resend(addr=3 n=0) -> [128.8.10.90].53 ds=7 nsid=58732 id=18470 0ms

nslookup queries the local name server again, and the name server fires off more queries:

datagram from [192.249.249.3].1051, fd 5, len 30
req: nlookup(nisc.sri.com) id 18470 type=1 class=1
req: missed 'nisc.sri.com' as 'com' (cname=0)
resend(addr=4 n=0) -> [192.203.230.10].53 ds=7 nsid=58732 id=18470 0ms
resend(addr=0 n=1) -> [198.41.0.4].53 ds=7 nsid=58732 id=18470 0ms
resend(addr=1 n=1) -> [128.9.0.107].53 ds=7 nsid=58732 id=18470 0ms
resend(addr=2 n=1) -> [192.33.4.12].53 ds=7 nsid=58732 id=18470 0ms
resend(addr=3 n=1) -> [128.8.10.90].53 ds=7 nsid=58732 id=18470 0ms
resend(addr=4 n=1) -> [192.203.230.10].53 ds=7 nsid=58732 id=18470 0ms
resend(addr=0 n=2) -> [198.41.0.4].53 ds=7 nsid=58732 id=18470 0ms
Debug turned OFF

From the debugging output, you can extract a list of the IP addresses of the name servers that your name server tried to query, and then check your connectivity to them. Odds are, ping won't have much better luck than your name server did:

% 

ping 198.41.0.4 -n 10

 
--ping first name server queried


PING
 198.41.0.4: 64 byte packets

----198.41.0.4 
PING
 Statistics----
10 packets transmitted, 0 packets received, 100% packet loss
% 

ping 128.9.0.107 -n 10

 
--ping second name server queried


PING
 128.9.0.107: 64 byte packets

----128.9.0.107 
PING
 Statistics----
10 packets transmitted, 0 packets received, 100% packet loss

If it does, you should check that the remote name servers are really running. You might also check whether your Internet firewall is inadvertently blocking your name server's queries. If you've upgraded to BIND 8 recently, see the sidebar "A Gotcha with BIND 8 and Packet Filtering Firewalls" , and see if it applies to you.

If ping can't get through, either, all that's left to do is to locate the break in the network. Utilities like traceroute and ping 's record route option can be very helpful in determining whether the problem is on your network, the destination network, or somewhere in the middle.

You should also use your own common sense when tracking down the break. In this trace, for example, the remote name servers your name server tried to query are all root name servers. (You might have had their PTR records cached somewhere, so you could find out their domain names.) Now it's not very likely that each root's local network went down, nor is it likely that the Internet's commercial backbone networks collapsed entirely. Occam's razor says that the simplest condition that could cause this behavior - namely, the loss of your network's link to the Internet - is the most likely cause.

13.3.9 9. Missing Subdomain Delegation

Even though the Inter NIC does its best to process your requests as quickly as possible, it may take a day or two for your domain's delegation to appear in the root name servers. If the Inter NIC doesn't manage your parent domain, your mileage may vary. Some parents are quick and responsible, others are slow and inconsistent. Just like in real life, though, you're stuck with them.[2]

[2] Until the GTLD Memorandum of Understanding is adopted, that is. See http://www.gtld-mou.org/ .

Until your delegation data appear in your parent domain's name servers, your name servers will be able to look up data in the Internet domain name space, but no one else on the Internet (outside of your domain) will know how to look up data in your name space.

That means that even though you can send mail outside of your domain, the recipients won't be able to reply to it. Furthermore, no one will be able to telnet to, ftp to, or even ping your hosts by name.

Remember that this applies equally to any in-addr.arpa subdomains you may run. Until the parent delegates those subdomains to your servers, name servers on the Internet won't be able to reverse map addresses on your networks.

To determine whether or not your zone's delegation has made it into your parent zone's name servers, query a parent name server for the NS records for your zone. If the parent name server has the data, any name server on the Internet can find it:

% 

nslookup


Default Server:  terminator.movie.edu
Address:  192.249.249.3

> 

server a.root-servers.net.

 
--Query a root name server

Default Server:  a.root-servers.net
Address:  198.41.0.4

> 

set norecurse

             
 - Instruct the server to answer out of its own data

> 

set type=ns

               
 - and to look for NS records

> 

249.249.192.in-addr.arpa.

 
 - for 249.249.192.in-addr.arpa

Server:  a.root-servers.net
Address:  198.41.0.4

*** a.root-servers.net can't find 249.249.192.in-addr.arpa.: Non-existent domain

Here, the delegation clearly hasn't been added yet. You can either wait patiently, or if an unreasonable amount of time has passed since you requested delegation from your parent, contact your parent and ask what's up.

13.3.10 10. Incorrect Subdomain Delegation

Incorrect subdomain delegation is another familiar problem on the Internet. Keeping delegation up to date requires human intervention - informing your parent zone's administrator of changes to your set of authoritative name servers. Consequently, delegation information often becomes inaccurate as administrators make changes without letting their parents know. Far too many administrators believe that setting up delegation is a one-shot deal: they let their parents know which name servers are authoritative once, when they set up their zone, and then they never talk to them again. They don't even call on Mother's Day.

An administrator may add a new name server, decommission another, and change the IP address of a third, all without telling the parent zone's administrator. Gradually, the number of name servers correctly delegated to by the parent zone dwindles. In the best case, this leads to long resolution times, as querying name servers struggle to find an authoritative name server for the zone. If the delegation information becomes badly out of date, and the last authoritative name server host is brought down for maintenance, the information within the zone will be inaccessible.

If you suspect bad delegation from your parent to your zone, from your zone to one of your children, or from a remote zone to one of its children, you can check with nslookup :

% 

nslookup


Default Server:  terminator.movie.edu
Address:  192.249.249.3

> 

server a.root-servers.net.

     
 - Set server to the parent name server you suspect

                                                        
has bad delegation

Default Server:  a.root-servers.net
Address:  198.41.0.4

> 

set type=ns

                    
 - Look for NS records

> 

hp.com.

                        
 - for the zone in question

Server:  a.root-servers.net
Address:  198.41.0.4

Non-authoritative answer:
hp.com          nameserver = 
RELAY
.
HP
.
COM

hp.com          nameserver = 
HPLABS
.
HPL
.
HP
.
COM

hp.com          nameserver = 
NNSC
.
NSF
.
NET

hp.com          nameserver = 
HPSDLO
.
SDD
.
HP
.
COM


Authoritative answers can be found from:
hp.com          nameserver = 
RELAY
.
HP
.
COM

hp.com          nameserver = 
HPLABS
.
HPL
.
HP
.
COM

hp.com          nameserver = 
NNSC
.
NSF
.
NET

hp.com          nameserver = 
HPSDLO
.
SDD
.
HP
.
COM


RELAY
.
HP
.
COM
    internet address = 15.255.152.2

HPLABS
.
HPL
.
HP
.
COM
       internet address = 15.255.176.47

NNSC
.
NSF
.
NET
    internet address = 128.89.1.178

HPSDLO
.
SDD
.
HP
.
COM
       internet address = 15.255.160.64

HPSDLO
.
SDD
.
HP
.
COM
       internet address = 15.26.112.11

Let's say you suspect that the delegation to hpsdlo.sdd.hp.com is incorrect. You now query hpsdlo for data in the hp.com zone and check the answer:

> 

server hpsdlo.sdd.hp.com.


Default Server:  hpsdlo.sdd.hp.com
Addresses:  15.255.160.64, 15.26.112.11

> 

set norecurse


> 

set type=soa


> 

hp.com.


Server:  hpsdlo.sdd.hp.com
Addresses:  15.255.160.64, 15.26.112.11

Non-authoritative answer:
hp.com
        origin = relay.hp.com
        mail addr = hostmaster.hp.com
        serial = 1001462
        refresh = 21600 (6 hours)
        retry   = 3600 (1 hour)
        expire  = 604800 (7 days)
        minimum ttl = 86400 (1 day)

Authoritative answers can be found from:
hp.com          nameserver = 
RELAY
.
HP
.
COM

hp.com          nameserver = 
HPLABS
.
HPL
.
HP
.
COM

hp.com          nameserver = 
NNSC
.
NSF
.
NET


RELAY
.
HP
.
COM
    internet address = 15.255.152.2

HPLABS
.
HPL
.
HP
.
COM
       internet address = 15.255.176.47

NNSC
.
NSF
.
NET
    internet address = 128.89.1.178

If hpsdlo really were authoritative, it would have responded with an authoritative answer. The administrator of the hp.com zone can tell you whether hpsdlo should be an authoritative name server for hp.com , so that's who you should contact.

Another common symptom of this is a "lame server" error message:

Oct 1 04:43:38 terminator named[146]: Lame server on '40.234.23.210.in-addr.arpa' (in '210.in-addr.arpa'?): [198.41.0.5].53 '
RS
0.
INTERNIC
.
NET
': learnt(A=198.41.0.21,
NS
=128.63.2.53)

Here's how to read that: your name server was referred by the name server at 128.63.2.53 to the name server at 198.41.0.5 for a name in the domain 210.in-addr.arpa specifically 40.234.23.210.in-addr.arpa . The server at 198.41.0.5's response indicated that it wasn't, in fact, authoritative for 210.in-addr.arpa , and therefore either the delegation that 128.63.2.53 gave you is wrong or the server at 198.41.0.5 is misconfigured.

13.3.11 11. Syntax Error in resolv.conf

Despite the resolv.conf file's simple syntax, people do occasionally make mistakes when editing it. And, unfortunately, lines with syntax errors in resolv.conf are silently ignored by the resolver. The result is usually that some part of your intended configuration doesn't take effect: either your domain or search list isn't set correctly, or the resolver won't query one of the name servers you configured it to query. Commands that rely on the search list won't work, your resolver won't query the right name server(s), or it won't query a name server at all.

The easiest way to check whether your resolv.conf file is having the intended effect is to run nslookup . nslookup will kindly report the default domain and search list it derives from resolv.conf , plus the name server it's querying, when you type set all , as we showed you in Chapter 11, nslookup :

% 

nslookup


Default Server:  terminator.movie.edu
Address:  192.249.249.3

> 

set all


Default Server:  terminator.movie.edu
Address:  192.249.249.3

Set options:
  nodebug         defname          search         recurse
  nod2            novc             noignoretc     port=53
  querytype=A     class=
IN
         timeout=5      retry=4
  root=ns.nic.ddn.mil.
  domain=movie.edu
  srchlist=movie.edu

>

Check that the output of set all is what you expect, given your resolv.conf file. For example, if you'd set search fx.movie.edu movie.edu in resolv.conf , you'd expect to see:

domain=fx.movie.edu
srchlist=fx.movie.edu/movie.edu

in the output. If you don't see what you're expecting, look carefully at resolv.conf . If you don't see anything obvious, look for nonprinting characters (with vi 's set list command, for example). Watch out for trailing spaces, especially; a trailing space after the domain name will set the default domain to include a space. No real domain names actually end with spaces, so all of your non-dot-terminated lookups will fail.

13.3.12 12. Default Domain Not Set

Failing to set your default domain is another old standby gaffe. You can set it implicitly, by setting your hostname to your host's fully qualified domain name, or explicitly, in resolv.conf . The characteristics of an unset default domain are straightforward: folks who use single-label names (or abbreviated domain names) in commands get no joy:

% 

telnet br


br: No address associated with name
% 

telnet br.fx


br.fx: No address associated with name
% 

telnet br.fx.movie.edu


Trying...
Connected to bladerunner.fx.movie.edu.
Escape character is '^]'.


HP-UX
 bladerunner.fx.movie.edu A.08.07 A 9000/730 (ttys1)
login:

You can use nslookup to check this one, much as you do when you suspect a syntax error in resolv.conf :

% 

nslookup


Default Server:  terminator.movie.edu
Address:  192.249.249.3

> 

set all


Default Server:  terminator.movie.edu
Address:  192.249.249.3

Set options:
  nodebug         defname         search          recurse
  nod2            novc            noignoretc      port=53
  querytype=A     class=
IN
        timeout=5       retry=4
  root=ns.nic.ddn.mil.
  domain=
  srchlist=

Notice that neither the local domain nor the search list is set. You can also track this down by enabling debugging on the name server. (This, of course, requires access to the name server, which may not be running on the host the problem's affecting.) Here's how the debugging output might look after trying those telnet commands:

Debug turned 
ON
, Level 1

datagram from [192.249.249.3].1057, fd 5, len 20
req: nlookup(br) id 27974 type=1 class=1
req: missed 'br' as '' (cname=0)
forw: forw -> [198.41.0.4].53 ds=7 nsid=61691 id=27974 0ms retry 4 sec

datagram from [198.41.0.4].53, fd 5, len 20
ncache: dname br, type 1, class 1
send_msg -> [192.249.249.3].1057 (
UDP
 5) id=27974

datagram from [192.249.249.3].1059, fd 5, len 23
req: nlookup(br.fx) id 27975 type=1 class=1
req: missed 'br.fx' as '' (cname=0)
forw: forw -> [128.9.0.107].53 ds=7 nsid=61692 id=27975 0ms retry 4 sec

datagram from [128.9.0.107].53, fd 5, len 23
ncache: dname br.fx, type 1, class 1
send_msg -> [192.249.249.3].1059 (
UDP
 5) id=27975

datagram from [192.249.249.3].1060, fd 5, len 33
req: nlookup(br.fx.movie.edu) id 27976 type=1 class=1
req: found 'br.fx.movie.edu' as 'br.fx.movie.edu' (cname=0)
req: nlookup(bladerunner.fx.movie.edu) id 27976 type=1 class=1
req: found 'bladerunner.fx.movie.edu' as 'bladerunner.fx.movie.edu'
     (cname=1)
ns_req: answer -> [192.249.249.3].1060 fd=5 id=27976 size=183 Local
Debug turned OFF

Contrast this with the debugging output produced by the application of the search list in Chapter 12 . The only names looked up here are exactly what the user typed, with no domains appended at all. Clearly the search list isn't being applied.

13.3.13 13. Response from Unexpected Source

One problem we've seen increasingly often in the DNS newsgroups is the "response from unexpected source." This was once called a Martian response: it's a response that comes from an IP address other than the one your server sent a query to. When a BIND name server sends a query to a remote server, BIND conscientiously makes sure that answers come only from the IP addresses on that server. This helps minimize the possibility of accepting spoofed responses. BIND is equally demanding of itself: a BIND server makes every effort to reply via the same network interface that it received a query on.

Here's the error message you'd see upon receiving a possibly unsolicited response:

Mar  8 17:21:04 terminator named[235]: Response from unexpected source ([205.199.4.131].53)

This can mean one of two things: either someone is trying to spoof your name server, or - more likely - you sent a query to an older BIND server or a different make of name server that's not as assiduous about replying from the same interface it receives queries on.


Previous: 13.2 Troubleshooting Tools and Techniques DNS & BIND Next: 13.4 Transition Problems
13.2 Troubleshooting Tools and Techniques Book Index 13.4 Transition Problems