Element Grammar (HTML & XHTML: The Definitive Guide, 4th Edition)

15.4.3. XML Element Grammar

The rules for defining the contents of an element match the grammar rules we just discussed. You may use sequences, choices, groups, and repetition to define the allowable contents of an element. The nonterminals in rules must be names of other elements defined in your DTD.

A few examples show how this works. Consider the declaration of the <html> tag, taken from the HTML DTD:

<!ELEMENT html (head, body)>

This defines the element named html whose content is a head element followed by a body element. Notice that you do not enclose the element names in angle brackets within the DTD; that notation is used only when the elements are actually used in a document.

Within the HTML DTD, you can find the declaration of the <head> tag:

<!ELEMENT head (%head.misc;,
     ((title, %head.misc;, (base, %head.misc;)?) |
      (base, %head.misc;, (title, %head.misc;))))>

Gulp. What on earth does this mean? First, notice that there is a parameter entity named head.misc used several times in this declaration. Let's go get it:

<!ENTITY % head.misc "(script|style|meta|link|object)*">

Now things are starting to make sense: head.misc defines a group of elements, from which you may choose one. However the trailing asterisk indicates that you may include zero or more of these elements. The net result is that anywhere %head.misc; appears, you can include zero or more script, style, meta, link, or object elements, in any order. Sound familiar?

Returning to the head declaration, we see that we are allowed to begin with any number of the head miscellaneous elements. We must then make a choice: either a group consisting of a title element, optional miscellaneous items, and an optional base element followed by miscellaneous items; or, a group consisting of a base element, miscellaneous items, a title element, and some more miscellaneous items.

Why such a convoluted rule for the <head> tag? Why not just write:

<!ELEMENT head (script|style|meta|link|object|base|title)*>

which allows any number of the head elements to appear, or none at all? Because the HTML standard requires that every <head> tag contain exactly one <title> tag. It also allows for only one <base> tag, if any. Otherwise, the standard does allow any number of the other head elements, in any order.

Put simply, the head element declaration, while initially confusing, forces the XML processor to ensure that exactly one title element appears in the head element, and that if specified, just one base element appears as well. It then allows for any of the other head elements, in any order.

This one example demonstrates a lot of the power of XML: the ability to define commonly used elements using parameter entities and the use of grammar rules to dictate document syntax. If you can work through the head element declaration and understand it, you are well on your way to reading any XML DTD.

15.4. Element Grammar

15.4.1. Sequence, Choice, Grouping, and Repetition

15.4.2. Multiple Grammar Rules

15.4.3. XML Element Grammar

15.4.4. Mixed Element Content

15.4.5. Empty Elements