Resolution (DNS and BIND, 4th Edition)

2.6. Resolution

Name servers are adept at retrieving data from the domain name space. They have to be, given the limited intelligence of most resolvers. Not only can they give you data from zones for which they're authoritative, they can also search through the domain name space to find data for which they're not authoritative. This process is called name resolution or simply resolution.

Because the namespace is structured as an inverted tree, a name server needs only one piece of information to find its way to any point in the tree: the domain names and addresses of the root name servers (is that more than one piece?). A name server can issue a query to a root name server for any domain name in the domain name space, and the root name server starts the name server on its way.

2.6.1. Root Name Servers

The root name servers know where the authoritative name servers for each of the top-level zones are. (In fact, some of the root name servers are authoritative for the generic top-level zones.) Given a query about any domain name, the root name servers can provide at least the names and addresses of the name servers that are authoritative for the top-level zone that the domain name ends in. And the top-level name servers can provide the list of the authoritative name servers for the second-level zone that the domain name ends in. Each name server queried gives the querier information about how to get "closer" to the answer it's seeking, or it provides the answer itself.

The root name servers are clearly important to resolution. Because they're so important, DNS provides mechanisms -- such as caching, which we'll discuss a little later -- to help offload the root name servers. But in the absence of other information, resolution has to start at the root name servers. This makes the root name servers crucial to the operation of DNS; if all the Internet root name servers were unreachable for an extended period, all resolution on the Internet would fail. To protect against this, the Internet has 13 root name servers (as of this writing) spread across different parts of the network. For example, one is on PSINet, a commercial Internet backbone; one is on the NASA Science Internet; two are in Europe; and one is in Japan.

Being the focal point for so many queries keeps the roots busy; even with 13, the traffic to each root name server is very high. A recent informal poll of root name server administrators showed some roots receiving thousands of queries per second.

Despite the load placed on root name servers, resolution on the Internet works quite well. Figure 2-12 shows the resolution process for the address of a real host in a real domain, including how the process corresponds to traversing the domain name space tree.

Figure 2-12. Resolution of girigiri.gbrmpa.gov.au on the Internet

The local name server queries a root name server for the address of girigiri.gbrmpa.gov.au and is referred to the au name servers. The local name server asks an au name server the same question, and is referred to the gov.au name servers. The gov.au name server refers the local name server to the gbrmpa.gov.au name servers. Finally, the local name server asks a gbrmpa.gov.au name server for the address and gets the answer.

2.6.2. Recursion

You may have noticed a big difference in the amount of work done by the name servers in the previous example. Four of the name servers simply returned the best answer they already had -- mostly referrals to other name servers -- to the queries they received. They didn't have to send their own queries to find the data requested. But one name server -- the one queried by the resolver -- had to follow successive referrals until it received an answer.

Why couldn't the local name server simply have referred the resolver to another name server? Because a stub resolver wouldn't have had the intelligence to follow a referral. And how did the name server know not to answer with a referral? Because the resolver issued a recursive query.

Queries come in two flavors, recursive and iterative (or nonrecursive). Recursive queries place most of the burden of resolution on a single name server. Recursion, or recursive resolution, is just a name for the resolution process used by a name server when it receives recursive queries. As with recursive algorithms in programming, the name server repeats the same basic process (querying a remote name server and following any referrals) until it receives an answer. Iteration, or iterative resolution, described in the next section, refers to the resolution process used by a name server when it receives iterative queries.

In recursion, a resolver sends a recursive query to a name server for information about a particular domain name. The queried name server is then obliged to respond with the requested data or with an error stating that data of the requested type doesn't exist or that the domain name specified doesn't exist.[14] The name server can't just refer the querier to a different name server because the query was recursive.

[14]BIND 8 name servers can be configured to ignore or refuse recursive queries; see Chapter 11, "Security", for how and why you'd want to do this.

If the queried name server isn't authoritative for the data requested, it will have to query other name servers to find the answer. It could send recursive queries to those name servers, thereby obliging them to find the answer and return it (and passing the buck). Or it could send iterative queries and possibly be referred to other name servers "closer" to the domain name it's looking for. Current implementations are polite and do the latter, following the referrals until an answer is found.[15]

[15]The exception is a name server configured to forward all unresolved queries to a designated name server, called a forwarder. See Chapter 10, "Advanced Features", for more information on using forwarders.

A name server that receives a recursive query that it can't answer itself will query the "closest known" name servers. The closest known name servers are the servers authoritative for the zone closest to the domain name being looked up. For example, if the name server receives a recursive query for the address of the domain name girigiri.gbrmpa.gov.au, it will first check whether it knows which name servers are authoritative for girigiri.gbrmpa.gov.au. If it does, it will send the query to one of them. If not, it will check whether it knows the name servers for gbrmpa.gov.au, and after that gov.au, and then au. The default, where the check is guaranteed to stop, is the root zone, since every name server knows the domain names and addresses of the root name servers.

Using the closest known name servers ensures that the resolution process is as short as possible. A berkeley.edu name server receiving a recursive query for the address of waxwing.ce.berkeley.edu shouldn't have to consult the root name servers; it can simply follow delegation information directly to the ce.berkeley.edu name servers. Likewise, a name server that has just looked up a domain name in ce.berkeley.edu shouldn't have to start resolution at the roots to look up another ce.berkeley.edu (or berkeley.edu) domain name; we'll show how this works in Section 2.7, "Caching".

The name server that receives the recursive query always sends the same query that the resolver sends it, for example, for the address of waxwing.ce.berkeley.edu. It never sends explicit queries for the name servers for ce.berkeley.edu or berkeley.edu, though this information is also stored in the namespace. Sending explicit queries could cause problems: there may be no ce.berkeley.edu name servers (that is, ce.berkeley.edu may be part of the berkeley.edu zone). Also, it's always possible that an edu or berkeley.edu name server already knows waxwing.ce.berkeley.edu's address. An explicit query for the berkeley.edu or ce.berkeley.edu name servers would miss this information.

IP addresses could have been represented the opposite way in the namespace, with the first octet of the IP address at the bottom of the in-addr.arpa domain. That way, the IP address would have read correctly (forward) in the domain name.

IP addresses are hierarchical, however, just like domain names. Network numbers are doled out much as domain names are, and administrators can then subnet their address space and further delegate numbering. The difference is that IP addresses get more specific from left to right, while domain names get less specific from left to right. Figure 2-15 shows what we mean.