Name servers are adept at retrieving data from the domain name space. They have to be, given the limited intelligence of some resolvers. Not only can they give you data about zones for which they're authoritative, they can also search through the domain name space to find data for which they're not authoritative. This process is called name resolution or simply resolution .
Because the name space is structured as an inverted tree, a name server needs only one piece of information to find its way to any point in the tree: the domain names and addresses of the root name servers (is that more than one piece?). A name server can issue a query to a root name server for any name in the domain name space, and the root name server will start the name server on its way.
The root name servers know where there are authoritative name servers for each of the top-level domains. (In fact, most of the root name servers are authoritative for the generic top-level domains.) Given a query about any domain name, the root name servers can at least provide the names and addresses of the name servers that are authoritative for the top-level domain that the domain name is in. And the top-level name servers can provide the list of name servers that are authoritative for the second-level domain that the domain name is in. Each name server queried gives the querier information about how to get "closer" to the answer it's seeking, or it provides the answer itself.
The root name servers are clearly important to resolution. Because they're so important, DNS provides mechanisms - such as caching, which we'll discuss a little later - to help offload the root name servers. But in the absence of other information, resolution has to start at the root name servers. This makes the root name servers crucial to the operation of DNS ; if all the Internet root name servers were unreachable for an extended period, all resolution on the Internet would fail. To protect against this, the Internet has thirteen root name servers (as of this writing) spread across different parts of the network. Two are on the MILNET , the U.S. military's portion of the Internet; one is on SPAN , NASA 's internet; two are in Europe; and one is in Japan.
Being the focal point for so many queries keeps the roots busy; even with thirteen, the traffic to each root name server is very high. A recent informal poll of root name server administrators showed some roots receiving thousands of queries per second.
Despite the load placed on root name servers, resolution on the Internet works quite well. Figure 2.12 shows the resolution process for the address of a real host in a real domain, including how the process corresponds to traversing the domain name space tree.
The local name server queries a root name server for the address of girigiri.gbrmpa.gov.au and is referred to the au name servers. The local name server asks an au name server the same question, and is referred to the gov.au name servers. The gov.au name server refers the local name server to the gbrmpa.gov.au name servers. Finally, the local name server asks a gbrmpa.gov.au name server for the address and gets the answer.
You may have noticed a big difference in the amount of work done by the name servers in the previous example. Four of the name servers simply returned the best answer they already had - mostly referrals to other name servers - to the queries they received. They didn't have to send their own queries to find the data requested. But one name server - the one queried by the resolver - had to follow successive referrals until it received an answer.
Why couldn't the local name server simply have referred the resolver to another name server? Because a stub resolver wouldn't have had the intelligence to follow a referral. And how did the name server know not to answer with a referral? Because the resolver issued a recursive query.
Queries come in two flavors, recursive and iterative , also called nonrecursive . Recursive queries place most of the burden of resolution on a single name server. Recursion , or recursive resolution , is just a name for the resolution process used by a name server when it receives recursive queries.
In recursion a resolver sends a recursive query to a name server for information about a particular domain name. The queried name server is then obliged to respond with the requested data or with an error stating that data of the requested type don't exist or that the domain name specified doesn't exist. The name server can't just refer the querier to a different name server, because the query was recursive.
If the queried name server isn't authoritative for the data requested, it will have to query other name servers to find the answer. It could send recursive queries to those name servers, thereby obliging them to find the answer and return it (and passing the buck). Or it could send iterative queries and possibly be referred to other name servers "closer" to the domain name it's looking for. Current implementations are polite and do the latter, following the referrals until an answer is found.
A name server that receives a recursive query that it can't answer itself will query the "closest known" name servers. The closest known name servers are the servers authoritative for the zone closest to the domain name being looked up. For example, if the name server receives a recursive query for the address of the domain name girigiri.gbrmpa.gov.au , it will first check whether it knows the name servers for girigiri.gbrmpa.gov.au . If it does, it will send the query to one of them. If not, it will check whether it knows the name servers for gbrmpa.gov.au , and after that gov.au , and then au . The default, where the check is guaranteed to stop, is the root zone, since every name server knows the domain names and addresses of the root name servers.
Using the closest known name servers ensures that the resolution process is as short as possible. A berkeley.edu name server receiving a recursive query for the address of waxwing.ce.berkeley.edu shouldn't have to consult the root name servers; it can simply follow delegation information directly to the ce.berkeley.edu name servers. Likewise, a name server that has just looked up a domain name in ce.berkeley.edu shouldn't have to start resolution at the roots to look up another ce.berkeley.edu (or berkeley.edu ) domain name; we'll show how this works in the upcoming section on caching.
The name server that receives the recursive query always sends the same query that the resolver sends it, for example, for the address of waxwing.ce.berkeley.edu . It never sends explicit queries for the name servers for ce.berkeley.edu or berkeley.edu , though this information is also stored in the name space. Sending explicit queries could cause problems: There may be no ce.berkeley.edu name servers (that is, ce.berkeley.edu may be part of the berkeley.edu zone). Also, it's always possible that an edu or berkeley.edu name server would know waxwing.ce.berkeley.edu 's address. An explicit query for the berkeley.edu or ce.berkeley.edu name servers would miss this information.
Iterative resolution, on the other hand, doesn't require nearly as much work on the part of the queried name server. In iterative resolution, a name server simply gives the best answer it already knows back to the querier. No additional querying is required. The queried name server consults its local data (including its cache, which we're about to talk about), looking for the data requested. If it doesn't find the data there, it makes its best attempt to give the querier data that will help it continue the resolution process. Usually these are the domain names and addresses of the closest known name servers.
What this amounts to is a resolution process that, taken as a whole, looks like Figure 2.13 .
A resolver queries a local name server, which then queries a number of other name servers in pursuit of an answer for the resolver. Each name server it queries refers it to another name server that is authoritative for a zone further down in the name space and closer to the domain name sought. Finally, the local name server queries the authoritative name server, which returns an answer.
One major piece of functionality missing from the resolution process as explained so far is how addresses get mapped back to names. Address-to-name mapping is used to produce output that is easier for humans to read and interpret (in log files, for instance). It's also used in some authorization checks. UNIX hosts map addresses to domain names to compare against entries in .rhosts and hosts.equiv files, for example. When using host tables, address-to-name mapping is trivial. It requires a straightforward sequential search through the host table for an address. The search returns the official host name listed. In DNS , however, address-to-name mapping isn't so simple. Data, including addresses, in the domain name space are indexed by name. Given a domain name, finding an address is relatively easy. But finding the domain name that maps to a given address would seem to require an exhaustive search of the data attached to every domain name in the tree.
Actually, there's a better solution that's both clever and effective. Because it's easy to find data once you're given the domain name that indexes that data, why not create a part of the domain name space that uses addresses as labels? In the Internet's domain name space, this portion of the name space is the in-addr.arpa domain.
Nodes in the in-addr.arpa domain are labelled after the numbers in the dotted-octet representation of IP addresses. (Dotted-octet representation refers to the common method of expressing 32-bit IP addresses as four numbers in the range 0 to 255, separated by dots.) The in-addr.arpa domain, for example, could have up to 256 subdomains, one corresponding to each possible value in the first octet of an IP address. Each of these subdomains could have up to 256 subdomains of its own, corresponding to the possible values of the second octet. Finally, at the fourth level down, there are resource records attached to the final octet giving the full domain name of the host or network at that IP address. That makes for an awfully big domain: in-addr.arpa , shown in Figure 2.14 , is roomy enough for every IP address on the Internet.
Note that when read in a domain name, the IP address appears backward because the name is read from leaf to root. For example, if winnie.corp.hp.com 's IP address is 188.8.131.52, the corresponding in-addr.arpa subdomain is 184.108.40.206.in-addr.arpa , which maps back to the domain name winnie.corp.hp.com .
IP addresses could have been represented the opposite way in the name space, with the first octet of the IP address at the bottom of the in-addr.arpa domain. That way, the IP address would have read correctly (forward) in the domain name.
IP addresses are hierarchical, however, just like domain names. Network numbers are doled out much as domain names are, and administrators can then subnet their address space and further delegate numbering. The difference is that IP addresses get more specific from left to right, while domain names get less specific from left to right. Figure 2.15 shows what we mean.
Making the first octets in the IP address appear highest in the tree gives administrators the ability to delegate authority for in-addr.arpa domains along network lines. For example, the 15.in-addr.arpa domain, which contains the reverse mapping information for all hosts whose IP addresses start with 15, can be delegated to the administrators of network 220.127.116.11. This would be impossible if the octets appeared in the opposite order. If the IP addresses were represented the other way around, 15.in-addr.arpa would consist of every host whose IP address ended with 15 - not a practical domain to try to delegate.
The in-addr.arpa name space is clearly only useful for IP address-to-domain name mapping. Searching for a domain name that indexes an arbitrary piece of data - something besides an address - in the domain name space would require another specialized name space like in-addr.arpa or an exhaustive search.
That exhaustive search is to some extent possible, and it's called an inverse query . An inverse query is a search for the domain name that indexes a given datum. It's processed solely by the name server receiving the query. That name server searches all of its local data for the item sought and returns the domain name that indexes it, if possible. If it can't find the data, it gives up. No attempt is made to forward the query to another name server.
Because any one name server only knows about part of the overall domain name space, an inverse query is never guaranteed to return an answer. For example, if a name server receives an inverse query for an IP address it knows nothing about, it can't return an answer, but it also doesn't know that the IP address doesn't exist, because it only holds part of the DNS database. What's more, the implementation of inverse queries is optional according to the DNS specification; BIND 4.9.7 still contains the code that implements inverse queries, but it's commented out by default. BIND 8 no longer includes that code at all, though it does recognize inverse queries and can make up fake responses to them. That's fine with us, because very little software (such as archaic versions of nslookup ) actually still uses inverse queries.