14.17 Walking the Document Node Tree

NN 6, IE 5

14.17.1 Problem

You want to iterate through the entire document node tree in search of nodes meeting desired criteria.

14.17.2 Solution

The following getLikeElements( ) function returns a collection of elements that share the same tag name, attribute name, and attribute value (specified as arguments):

function getLikeElements(tagName, attrName, attrValue) {
    var startSet;
    var endSet = new Array( );
    if (tagName) {
        startSet = document.getElementsByTagName(tagName);    
    } else {
        startSet = (document.all) ? document.all : 
            document.getElementsByTagName("*");
    }
    if (attrName) {
        for (var i = 0; i < startSet.length; i++) {
            if (startSet[i].getAttribute(attrName)) {
                if (attrValue) {
                    if (startSet[i].getAttribute(attrName) =  = attrValue) {
                        endSet[endSet.length] = startSet[i];
                    }
                } else {
                    endSet[endSet.length] = startSet[i];
                }
            }
        }
    } else {
        endSet = startSet;
    }
    return endSet;
}

14.17.3 Discussion

You can omit one or more arguments of the getLikeElements( ) function in specific combinations. For example, if you omit all three arguments, you receive a collection of all elements in the document. Specify only the first argument (the tag name) to retrieve all elements with the same tag name. If you supply the tag name and attribute name only, the returned collection contains elements that have the same tag name and have the same attribute specified, regardless of attribute value. If you specify an attribute value, you must also pass an attribute name. For empty arguments, pass either an empty string or null when they precede nonempty arguments. The following invocations of getLikeElements( ) are all valid:

var collection = getLikeElements( );
var collection = getLikeElements("td");
var collection = getLikeElements("", "class");
var collection = getLikeElements("", "class", "highlight");
var collection = getLikeElements("td", "align", "center");

Use caution, however, when retrieving input elements that have value attributes. Netscape returns only those elements with explicitly set value attributes, while IE returns all input elements because the browser automatically assigns a value attribute to input elements such as radio and checkbox buttons.

Another variation on the notion of walking a document tree is to use a script to diagram the document to reveal its nested node structure. Object model facilities for retrieving all elements in a document completely flatten the node hierarchy. To preserve the hierarchy and track it, you can use a routine like the following walkChildNodes( ) function, which accumulates a string that reveals the node structure of any object passed as the first parameter of the function. The function invokes itself recursively as it dives into nested hierarchies, and internally passes the second argument to help the function keep track of which nested level it is currently processing.

function walkChildNodes(objRef, n) {
    var obj;
    if (objRef) {
        if (typeof objRef =  = "string") {
            obj = document.getElementById(objRef);
        } else {
            obj = objRef;
        }
    } else {
        obj = (document.body.parentElement) ? 
            document.body.parentElement : document.body.parentNode;
    }
    var output = "";
    var indent = "";
    var i, group, txt;
    if (n) {
        for (i = 0; i < n; i++) {
            indent += "+---";
        }
    } else {
        n = 0;
        output += "Child Nodes of <" + obj.tagName .toLowerCase( );
        output += ">\n=  ==  ==  ==  ==  ==  ==  ==  ==  ==  ==\n";
    }
    group = obj.childNodes;
    for (i = 0; i < group.length; i++) {
        output += indent;
        switch (group[i].nodeType) {
            case 1:
                output += "<" + group[i].tagName.toLowerCase( );
                output += (group[i].id) ? " ID=" + group[i].id : "";
                output += (group[i].name) ? " NAME=" + group[i].name : "";
                output += ">\n";
                break;
            case 3:
                txt = group[i].nodeValue.substr(0,15);
                output += "[Text:\"" + txt.replace(/[\r\n]/g,"<cr>");
                if (group[i].nodeValue.length > 15) {
                    output += "...";
                }
                output += "\"]\n";
                break;
            case 8:
                output += "[!COMMENT!]\n";
                break;
            default:
                output += "[Node Type = " + group[i].nodeType + "]\n";
        }
        if (group[i].childNodes.length > 0) {
            output += walkChildNodes(group[i], n+1);
        }
    }
    return output;
}

To invoke the walkChildNodes( ) function to capture the node structure of a document's body element, the call looks like the following:

walkChildNodes(document.body);

Output from walkChildNodes( ) displays the tags of each element node (with their IDs, if assigned), and samples of text nodes to help you identify them. The following trace shows the body of a document containing the Recipe 14.1 script plus a portion of the table from the discussion of Recipe 14.15:

Child Nodes of <body>
=  ==  ==  ==  ==  ==  ==  ==  ==  ==  ==
<h1>
+---[Text:"Welcome to Gian..."]
<h2>
+---[Text:"We Love"]
+---<script>
+---[Text:" Windows "]
+---<noscript>
+---[Text:"Users!"]
<hr>
<form>
+---<table>
+---+---<tbody ID=myTBody>
+---+---+---<tr>
+---+---+---+---<td>
+---+---+---+---+---<input>
+---+---+---+---<td>
+---+---+---+---+---[Text:"Item 1"]
+---+---+---<tr>
+---+---+---+---<td>
+---+---+---+---+---<input>
+---+---+---+---<td>
+---+---+---+---+---[Text:"Item 2"]
+---+---+---<tr>
+---+---+---+---<td>
+---+---+---+---+---<input>
+---+---+---+---</td>

You can use the walkChildNodes( ) function as a diagnostic tool, particularly for dynamically created HTML content. If you embed the function into the document as well as into a temporary textarea element, your content creation function can end with a call to walkChildNodes( ) to output the results to the textarea for closer inspection, and comparison against what you think the node hierarchy should be.

One last technique to be aware of is the W3C DOM TreeWalker object, which is available in Netscape 7 and later (but not in IE as of Version 6). The TreeWalker object is a live, hierarchical list of nodes that meet criteria defined by the document.createTreeWalker( ) method. The list assumes the same parent-descendant hierarchy for its items as the nodes to which its items point. The createTreeWalker( ) method describes the node where the list begins and which nodes (or classes of nodes) are exempt from the list by way of filtering.

The TreeWalker object maintains a kind of pointer inside the list (so that your scripts don't have to). Methods of this object let scripts access the next or previous node (or sibling, child, or parent node) in the list, while moving the pointer in the direction indicated by the method you chose. If scripts modify the document tree after the TreeWalker is created, changes to the document tree are automatically reflected in the sequence of nodes in the TreeWalker.

While fully usable in an HTML document, the TreeWalker can be even more valuable when working with an XML data document. For example, the W3C DOM does not provide a quick way to access all elements that have a particular attribute name (something that the XPath standard can do easily on the server). But you can define a TreeWalker to point only to nodes that have the desired attribute and quickly access those nodes sequentially (i.e., without having to script more laborious looping through all nodes in search of the desired elements). For example, the following filter function allows only those nodes that contain the author attribute to be a member of a TreeWalker object:

function authorAttrFilter(node) {
    if (node.hasAttribute("author")) {
        return NodeFilter.FILTER_ACCEPT;        
    }
    return NodeFilter.FILTER_SKIP;
}

A reference to this function becomes one of the parameters to a createTreeWalker( ) method that also limits the list to element nodes:

var authorsOnly = document.createTreeWalker(document, NodeFilter.SHOW_ELEMENT, 
                  authorAttrFilter, false);

You can then invoke TreeWalker object methods to obtain a reference to one of the nodes in the list. When you invoke the method, the TreeWalker object applies the filter to candidates relative to the current position of the internal pointer in the direction indicated by the method. The next document tree node to meet the method and filter criteria is returned. Once you have that node reference, you can access any DOM node property or method to work with the node, independent of the items in the TreeWalker list.

14.17.4 See Also

Recipe 1.1 for concatenating string segments to build long strings.