home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Book HomeXML in a NutshellSearch this book

3.2. Element Declarations

Every element used in a valid document must be declared in the document's DTD with an element declaration. Element declarations have this basic form:

<!ELEMENT element_name content_specification>

The name of the element can be any legal XML name. The content specification specifies what children the element may or must have in what order. Content specifications can be quite complex. They can say, for example, that an element must have three child elements of a given type, or two children of one type followed by another element of a second type, or any elements chosen from seven different types interspersed with text.

3.2.4. The Number of Children

As the previous examples indicate, not all instances of a given element necessarily have exactly the same children. You can affix one of three suffixes to an element name in a content specification to indicate how many of that element are expected at that position. These suffixes are:

?
Zero or one of the element is allowed.

*
Zero or more of the element is allowed.

+
One or more of the element is required.

For example, this declaration says that a name element must contain a first_name, may or may not contain a middle_name, and may or may not contain a last_name:

<!ELEMENT name (first_name, middle_name?, last_name?)>

Given this declaration, all these name elements are valid:

<name>
  <first_name>Madonna</first_name>
  <last_name>Ciconne</last_name>
</name>
<name>
  <first_name>Madonna</first_name>
  <middle_name>Louise</middle_name>
  <last_name>Ciconne</last_name>
</name>
<name>
  <first_name>Madonna</first_name>
</name>

However, these are not valid:

<name>
  <first_name>George</first_name>
  <!-- only one middle name is allowed -->
  <middle_name>Herbert</middle_name>
  <middle_name>Walker</middle_name>
  <last_name>Bush</last_name>
</name>
<name>
  <!-- first name must precede last name -->
  <last_name>Ciconne</last_name>
  <first_name>Madonna</first_name>
</name>

You can allow for multiple middle names by placing an asterisk after the middle_name:

<!ELEMENT name (first_name, middle_name*, last_name?)>

If you wanted to require a middle_name to be included, but still allow for multiple middle names, you'd use a plus sign instead, like this:

<!ELEMENT name (first_name, middle_name+, last_name?)>

3.2.6. Parentheses

Individually, choices, sequences, and suffixes are fairly limited. However, they can be combined in arbitrarily complex fashions to describe most reasonable content models. Either a choice or a sequence can be enclosed in parentheses. When so enclosed, the choice or sequence can be suffixed with a ?, *, or +. Furthermore, the parenthesized item can be nested inside other choices or sequences.

For example, let's suppose you want to say that a circle element contains a center element and either a radius or a diameter element, but not both. This declaration does that:

<!ELEMENT circle (center, (radius | diameter))>

To continue with a geometry example, suppose a center element can either be defined in terms of Cartesian or polar coordinates. Then each center contains either an x and a y or an r and a Figure . We would declare this using two small sequences, each of which is parenthesized and combined in a choice:

<!ELEMENT center ((x, y) | (r, Figure ))>

Suppose you don't really care whether the x element comes before the y element or vice versa, nor do you care whether r comes before Figure . Then you can expand the choice to cover all four possibilities:

<!ELEMENT center ((x, y) | (y, x) | (r, Figure ) | (Figure , r) )>

As the number of elements in the sequence grows, the number of permutations grows more than exponentially. Thus, this technique really isn't practical past two or three child elements. DTDs are not very good at saying you want n instances of A and m instances of B, but you don't really care which order they come in.

Suffixes can be applied to parenthesized elements too. For instance, let's suppose that a polygon is defined by individual coordinates for each vertex, given in order. For example, this is a right triangle:

Figure 3.2.6

What we want to say is that a polygon is composed of three or more pairs of x-y or r-Figure coordinates. An x is always followed by a y, and an r is always followed by a Figure . This declaration does that:

Figure 3.2.6

The plus sign is applied to ((x, y) | (r, Figure )).

To return to the name example, suppose you want to say that a name can contain just a first name, just a last name, or a first name and a last name with an indefinite number of middle names. This declaration achieves that:

<!ELEMENT name (last_name
               | (first_name, ( (middle_name+, last_name) | (last_name?) )
               ) >

3.2.7. Mixed Content

In narrative documents it's common for a single element to contain both child elements and un-marked up, nonwhitespace character data. For example, recall this definition element from Chapter 2:

<definition>The <term>Turing Machine</term> is an abstract finite 
state automaton with infinite memory that can be proven equivalent 
to any any other finite state automaton with arbitrarily large memory. 
Thus what is true for a Turing machine is true for all equivalent 
machines no matter how implemented.
</definition>

The definition element contains some nonwhitespace text and a term child. This is called mixed content. An element that contains mixed content is declared like this:

<!ELEMENT definition (#PCDATA | term)*>

This says that a definition element may contain parsed character data and term children. It does not specify in which order they appear, nor how many instances of each appear. This declaration allows a definition to have one term child, no term children, or twenty-three term children.

You can add any number of other child elements to the list of mixed content, though #PCDATA must always be the first child in the list. For example, this declaration says that a paragraph element may contain any number of name, profession, footnote, emphasize, and date elements in any order, interspersed with parsed character data:

<!ELEMENT paragraph
  (#PCDATA | name | profession | footnote | emphasize | date )*
>

This is the only way to indicate that an element contains mixed content. You cannot say, for example, that there must be exactly one term child of the definition element, as well as parsed character data. You cannot say that the parsed character data must all come after the term child. You cannot use parentheses around a mixed-content declaration to make it part of a larger grouping. You can only say that the element contains any number of any elements from a particular list in any order, as well as undifferentiated parsed character data.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.