Convenience Methods: The Traversal and Range APIs (JavaScript: The Definitive Guide, 4th Edition)

17.5.1. The DOM Traversal API

At the beginning of this chapter, we saw techniques for traversing the document tree by recursively examining each node in turn. This is an important technique, but it is often overkill; we do not typically want to examine every node of a document. We instead might want to examine only the <img> elements in a document, or to traverse only the subtrees of <table> elements. The Traversal API provides advanced techniques for this kind of selective document traversal. As noted previously, the Traversal API is optional and, at the time of this writing, is not implemented in major sixth-generation browsers. You can test whether it is supported by a DOM-compliant browser with the following:

document.implementation.hasFeature("Traversal", 2.0)  // True if supported

17.5.1.1. NodeIterator and TreeWalker

The Traversal API consists of two key objects, each of which provides a different filtered view of a document. The NodeIterator object provides a "flattened" sequential view of the nodes in a document and supports filtering. You could define a NodeIterator that filters out all document content except <img> tags and presents those image elements to you as a list. The nextNode( ) and previousNode( ) methods of the Node-Iterator object allow you to move forward and backward through the list. Note that NodeIterator allows you to traverse selected parts of a document without recursion; you can simply use a NodeIterator within a loop, calling nextNode( ) repeatedly until you find the node or nodes in which you are interested, or until it returns null, indicating that it has reached the end of the document.

The other key object in the Traversal API is TreeWalker. This object also provides a filtered view of a document and allows you to traverse the filtered document by calling nextNode( ) and previousNode( ), but it does not flatten the document tree. TreeWalker retains the tree structure of the document (although this tree structure may be dramatically modified by node filtering) and allows you to navigate the tree with the firstChild( ), lastChild( ), nextSibling( ), previousSibling( ), and parentNode( ) methods. You would use a TreeWalker instead of a NodeIterator when you want to traverse the filtered tree yourself, instead of simply calling nextNode( ) to iterate through it, or when you want to perform a more sophisticated traversal, skipping, for example, some subtrees.

The Document object defines createNodeIterator( ) and createTreeWalker( ) methods for creating NodeIterator and TreeWalker objects. A practical way to check whether a browser supports the Traversal API is to test for the existence of these methods:

if (document.createNodeIterator && document.createTreeWalker) {
    /* Safe to use Traversal API */
}

Both createNodeIterator( ) and createTreeWalker( ) are passed the same four arguments and differ only in the type of object they return. The first argument is the node at which the traversal is to begin. This should be the Document object if you want to traverse or iterate through the entire document, or any other node if you want to traverse only a subtree of the document. The second argument is a number that indicates the types of nodes NodeIterator or TreeWalker should return. This argument is formed by taking the sum of one or more of the SHOW_ constants defined by the NodeFilter object (discussed in the next section). The third argument to both methods is an optional function used to specify a more complex filter than simply including or rejecting nodes based on their type (again, see the next section). The final argument is a boolean value that specifies whether entity reference nodes in the document should be expanded during the traversal. This option can be useful when you're working with XML documents, but web programmers working with HTML documents can ignore it and pass false.

17.5.1.2. Filtering

One of the most important features of NodeIterator and TreeWalker is their selectivity, their ability to filter out nodes you don't care about. As described previously, you specify the nodes you are interested in with the second and (optionally) third arguments to createNodeIterator( ) and createTreeWalker( ). These arguments specify two levels of filtering. The first level simply accepts or rejects nodes based on their type. The NodeFilter object defines a numeric constant for each type of node, and you specify the types of nodes you are interested in by adding together (or by using the | bitwise OR operator on) the appropriate constants.

For example, if you are interested in only the Element and Text nodes of a document, you can use the following expression as the second argument:

NodeFilter.SHOW_ELEMENT + NodeFilter.SHOW_TEXT

If you are interested in only Element nodes, use:

NodeFilter.SHOW_ELEMENT

If you are interested in all nodes or do not want to reject any nodes simply on the basis of their types, use the special constant:

NodeFilter.SHOW_ALL

And if you are interested in all types of nodes except for comments, use:

~NodeFilter.SHOW_COMMENT

(See Chapter 5 if you've forgotten the meaning of the ~ operator.) Note that this first level of filtering applies to individual nodes but not to their children. If the second argument is NodeFilter.SHOW_TEXT, your NodeIterator or TreeWalker does not return element nodes to you, but it does not discard them entirely; it still traverses the subtree beneath the Element nodes to find the Text nodes you are interested in.

Any nodes that pass this type-based filtration may be put through a second level of filtering. This second filter is implemented by a function you define and can therefore perform arbitrarily complex filtering. If you do not need this kind of filtering, you can simply specify null as the value of the third argument to create-NodeIterator( ) or createTreeWalker( ). But if you do want this kind of filtering, you must pass a function as the third argument.

The function should expect a single node argument, and it should evaluate the node and return a value that indicates whether the node should be filtered out. There are three possible return values, defined by three NodeFilter constants. If your filter function returns NodeFilter.FILTER_ACCEPT, the node is returned by the NodeIterator or TreeWalker. If your function returns NodeFilter.FILTER_SKIP, the node is filtered out and is not returned by the NodeIterator or TreeWalker. The children of the node are still traversed, however. If you are working with a TreeWalker, your filter function may also return the value NodeFilter.FILTER_REJECT, which specifies that the node should not be returned and that it should not even be traversed.

Example 17-10 demonstrates the creation and use of a NodeIterator and should clarify the previous discussion. Note, however, that at the time of this writing none of the major web browsers support the Traversal API, so this example is untested!

Example 17-10. Creating and using a NodeIterator

// Define a NodeFilter function to accept only <img> elements
function imgfilter(n) {
    if (n.tagName == 'IMG') return NodeFilter.FILTER_ACCEPT;
    else return NodeFilter.FILTER_SKIP;
}

// Create a NodeIterator to find <img> tags
var images = document.createNodeIterator(document,  // Traverse entire document
    /* Look only at Element nodes */     NodeFilter.SHOW_ELEMENT,
    /* Filter out all but <img> */       imgfilter,
    /* Unused in HTML documents */       false);

// Use the iterator to loop through all images and do something with them
var image;
while((image = images.nextNode( )) != null) {
    image.style.visibility = "hidden";  // Process the image here
}

17.5.2. The DOM Range API

The DOM Range API consists of a single interface, Range. A Range object represents a contiguous range [61] of document content, contained between a specified start position and a specified end position. Many applications that display text and documents allow the user to select a portion of the document by dragging with the mouse. Such a selected portion of a document is conceptually equivalent to a range.[62] When a node of a document tree falls within a range, we often say that the node is "selected," even though the Range object may not have anything to do with a selection action initiated by the end user. When the start and end positions of a range are the same, we say that the range is "collapsed." In this case, the Range object represents a single position or insertion point within a document.

[61]That is, a logically contiguous range. In bidirectional languages such as Arabic and Hebrew, a logically contiguous range of a document may be visually discontiguous when displayed.

[62]Although web browsers typically allow the user to select document content, the current DOM Level 2 standard does not make the contents of those ranges available to JavaScript, so there is no standard way to obtain a Range object that corresponds to a user's desired selection.

The Range object provides methods for defining the start and end positions of a range, copying and deleting the contents of a range, and inserting nodes at the start position of a range. Support for the Range API is optional. At the time of this writing, it is supported by Netscape 6.1. IE 5 supports a proprietary API that is similar to, but not compatible with, the Range API. You can test for Range support with this code:

document.implementation.hasFeature("Range", "2.0");  // True if Range is supported

17.5.2.1. Start and end positions

The start and end positions of a range are each specified by two values. The first value is a document node, typically a Document, Element, or Text object. The second value is a number that represents a position within that node. When the node is a document or element, the number represents a position between the children of the document or the element. An offset of 0, for example, represents the position immediately before the first child of the node. An offset of 1 represents the position after the first child and before the second. When the specified node is a Text node (or another text-based node type, such as Comment), the number represents a position between the characters of text. An offset of 0 specifies the position before the first character of text, an offset of 1 specifies the position between the first and second characters, and so on. With start and end positions specified in this way, a range represents all nodes and/or characters between the start and end positions. The real power of the Range interface is that the start and end positions may fall within different nodes of the document, and therefore a range may span multiple (and fractional) Element and Text nodes.

To demonstrate the action of the various range-manipulation methods, I'm going to adopt the notation used in the DOM specification for illustrating the document content represented by a range. Document contents are shown in the form of HTML source code, with the contents of a range in bold. For example, the following line represents a range that begins at position 0 within the <body> node and continues to position 8 within the Text node contained within the <h1> node:

<body><h1>Document Title</h1><body>

To create a Range object, call the createRange( ) method of the Document object:

var r = document.createRange( );

Newly created ranges have both start and end points initialized to position 0 within the Document object. Before you can do anything interesting with a range, you must set the start and end positions to specify the desired document range. There are several ways you can do this. The most general way is to call the setStart( ) and setEnd( ) methods to specify the start and end points. Each is passed a node and a position within the node.

A higher-level technique for setting a start and/or end position is to call setStartBefore( ), setStartAfter( ), setEndBefore( ), or setEndAfter( ). These methods each take a single node as their argument. They set the start or end position of the Range to the position before or after the specified node within the parent of that node.

Finally, if you want to define a Range that represents a single Node or subtree of a document, you can use the selectNode( ) or selectNodeContent( ) method. Both methods take a single node argument. selectNode( ) sets the start and end positions before and after the specified node within its parent, defining a range that includes the node and all of its children. selectNodeContent( ) sets the start of the range to the position before the first child of the node and sets the end of the range to the position after the last child of the node. The resulting range contains all the children of the specified node, but not the node itself.

17.5.2.2. Manipulating ranges

Once you've defined a range, there are a number of interesting things you can do with it. To delete the document content within a range, simply call the deleteContents( ) method of the Range object. When a range includes partially selected Text nodes, the deletion operation is a little tricky. Consider the following range:

<p>This is <i>only</i> a test

After a call to deleteContents( ), the affected portion of the document looks like this:

<p>This<i>ly</i> a test

Even though the <i> element was included (partially) in the Range, that element remains (with modified content) in the document tree after the deletion.

If you want to remove the content of a range from a document but also want to save the extracted content (for reinsertion as part of a paste operation, perhaps), you should use extractContents( ) instead of deleteContents( ). This method removes nodes from the document tree and inserts them into a DocumentFragment (introduced earlier in this chapter), which it returns. When a range includes a partially selected node, that node remains in the document tree and has its content modified as needed. A clone of the node (see Node.cloneNode( )) is made (and modified) to insert into the DocumentFragment. Consider the previous example again. If extractContents( ) is called instead of deleteContents( ), the effect on the document is the same as shown previously, and the returned DocumentFragment contains:

is <i>on</i>

extractContents( ) works when you want to perform the equivalent of a cut operation on the document. If instead you want to do a copy operation and extract content without deleting it from the document, use cloneContents( ) instead of extractContents( ).[63]

[63]Implementing word processor-style cut, copy, and paste operations is actually more complex than this. Simple range operations on a complex document tree do not always produce the desired cut-and-paste behavior in the linear view of the document.

In addition to specifying the boundaries of text to be deleted or cloned, the start position of a range can be used to indicate an insertion point within a document. The insertNode( ) method of a range inserts the specified node (and all of its children) into the document at the start position of the range. If the specified node is already part of the document tree, it is moved from its current location and reinserted at the position specified by the range. If the specified node is a DocumentFragment, all the children of the node are inserted instead of the node itself.

Another useful method of the Range object is surroundContents( ). This method reparents the contents of a range to the specified node and inserts that node into the document tree at the position of the range. For example, by passing a newly created <i> node to surroundContents( ), you could transform this range:

This is only a test

into:

This is <i>only</i> a test

Note that because opening and closing tags must be properly nested in HTML files, surroundContents( ) cannot be used (and will throw an exception) for ranges that partially select any nodes other than Text nodes. The range used earlier to illustrate the deleteContents( ) method could not be used with surroundContents( ), for example.

The Range object has various other features as well. You can compare the boundaries of two different ranges with compareBoundaryPoints( ), clone a range with cloneRange( ), and extract a plain-text copy of the content of a range (not including any markup) with toString( ). The start and end positions of a range are accessible through the read-only properties startContainer, startOffset, endContainer, and endOffset. The start and end points of all valid ranges share a common ancestor somewhere in the document tree, even if it is the Document object at the root of the tree. You can find out what this common ancestor is with the commonAncestorContainer property of the range.