Document Type Definitions (Webmaster in a Nutshell, 3rd Edition)

10.4. Document Type Definitions

A DTD specifies how elements inside an XML document should relate to each other. It also provides grammar rules for the document and each of its elements. A document adhering to the XML specifications and the rules outlined by its DTD is considered to be valid. (Don't confuse this with a well-formed document, which adheres only to the XML syntax rules outlined earlier.)

10.4.1. Element Declarations

You must declare each of the elements that appear inside your XML document within your DTD. You can do so with the <!ELEMENT> declaration, which uses this format:

<!ELEMENT elementname rule>

This declares an XML element and an associated rule called a content model, which relates the element logically to the XML document. The element name should not include <> characters. An element name must start with a letter or an underscore. After that, it can have any number of letters, numbers, hyphens, periods, or underscores in its name. Element names may not start with the string xml in any variation of upper- or lowercase. You can use a colon in element names only if you use namespaces; otherwise, it is forbidden.

Attribute

Description

?

Must appear once or not at all (zero or one times)

+

Must appear at least once (one or more times)

*

May appear any number of times or not at all (zero or more times)

<!ELEMENT reviews (rating, synopsis?, comments+)*> <!ELEMENT rating ((tutorial|reference)*, overall)> <!ELEMENT synopsis (#PCDATA)> <!ELEMENT comments (#PCDATA)> <!ELEMENT tutorial (#PCDATA)> <!ELEMENT reference (#PCDATA)> <!ELEMENT overall (#PCDATA)>

10.4.4. Attribute Declarations in the DTD

Attributes for various XML elements must be specified in the DTD. You can specify each of the attributes with the <!ATTLIST> declaration, which uses the following form:

<!ATTLIST target_element attr_name attr_type default>

The <!ATTLIST> declaration consists of the target element name, the name of the attribute, its datatype, and any default value you want to give it.

Here are some examples of legal <!ATTLIST> declarations:

<!ATTLIST box length CDATA "0">
<!ATTLIST box width CDATA "0">
<!ATTLIST frame visible (true|false) "true">
<!ATTLIST person marital
     (single | married | divorced | widowed) #IMPLIED>

In these examples, the first keyword after ATTLIST declares the name of the target element (i.e., <box>, <frame>, <person>). This is followed by the name of the attribute (i.e., length, width, visible, marital). This, in turn, is generally followed by the datatype of the attribute and its default value.

10.4.4.1. Attribute modifiers

Let's look at the default value first. You can specify any default value allowed by the specified datatype. This value must appear as a quoted string. If a default value is not appropriate, you can specify one of the modifiers listed in the following table in its place.

Modifier	Description
`#REQUIRED`	The attribute value must be specified with the element.
`#IMPLIED`	The attribute value is unspecified, to be determined by the application.
`#FIXED "``value``"`	The attribute value is fixed and cannot be changed by the user.
`"``value``"`	The default value of the attribute.

With the #IMPLIED keyword, the value can be omitted from the XML document. The XML parser must notify the application, which can take whatever action it deems appropriate at that point. With the #FIXED keyword, you must specify the default value immediately afterwards:

<!ATTLIST date year CDATA #FIXED "2001">

10.4.4.2. Datatypes

The following table lists legal datatypes to use in a DTD.

Type	Description
`CDATA`	Character data
enumerated	A series of values from which only one can be chosen
`ENTITY`	An entity declared in the DTD
`ENTITIES`	Multiple whitespace-separated entities declared in the DTD
`ID`	A unique element identifier
`IDREF`	The value of a unique ID type attribute
`IDREFS`	Multiple whitespace-separated IDREFs of elements
`NMTOKEN`	An XML name token
`NMTOKENS`	Multiple whitespace-separated XML name tokens
`NOTATION`	A notation declared in the DTD

The CDATA keyword simply declares that any character data can appear, although it must adhere to the same rules as the PCDATA tag. Here are some examples of attribute declarations that use CDATA:

<!ATTLIST person name CDATA #REQUIRED>
<!ATTLIST person email CDATA #REQUIRED>
<!ATTLIST person company CDATA #FIXED "OReilly">

Here are two examples of enumerated datatypes where no keywords are specified. Instead, the possible values are simply listed:

<!ATTLIST person marital
   (single | married | divorced | widowed) #IMPLIED>
<!ATTLIST person sex (male | female) #REQUIRED>

The ID, IDREF, and IDREFS datatypes allow you to define attributes as IDs and ID references. An ID is simply an attribute whose value distinguishes the current element from all others in the current XML document. IDs are useful for applications to link to various sections of a document that contain an element with a uniquely tagged ID. IDREFs are attributes that reference other IDs. Consider the following XML document:

<?xml version="1.0" standalone="yes"?>
<!DOCTYPE sector SYSTEM sector.dtd>
<sector>
   <employee empid="e1013">Jack Russell</employee>
   <employee empid="e1014">Samuel Tessen</employee>
   <employee empid="e1015" boss="e1013">
      Terri White</employee>
   <employee empid="e1016" boss="e1014">
      Steve McAlister</employee>
</sector>

and its DTD:

<!ELEMENT sector (employee*)>
<!ELEMENT employee (#PCDATA)>
<!ATTLIST employee empid ID #REQUIRED>
<!ATTLIST employee boss IDREF #IMPLIED>

Here, all employees have their own identification numbers (e1013, e1014, etc.), which we define in the DTD with the ID keyword using the empid attribute. This attribute then forms an ID for each <employee> element; no two <employee> elements can have the same ID.

Attributes that only reference other elements use the IDREF datatype. In this case, the boss attribute is an IDREF because it uses only the values of other ID attributes as its values. IDs will come into play when we discuss XLink and XPointer.

The IDREFS datatype is used if you want the attribute to refer to more than one ID in its value. The IDs must be separated by whitespace. For example, adding this to the DTD:

<!ATTLIST employee managers IDREFS #REQUIRED>

allows you to legally use the XML:

<employee empid="e1016" boss="e1014"
          managers="e1014 e1013">
    Steve McAllister
</employee>

The NMTOKEN and NMTOKENS attributes declare XML name tokens. An XML name token is simply a legal XML name that consists of letters, digits, underscores, hyphens, and periods. It can contain a colon if it is part of a namespace. It may not contain whitespace; however, any of the permitted characters for an XML name can be the first character of an XML name token (e.g., .profile is a legal XML name token, but not a legal XML name). These datatypes are useful if you enumerate tokens of languages or other keyword sets that match these restrictions in the DTD.

The attribute types ENTITY and ENTITIES allow you to exploit an entity declared in the DTD. This includes unparsed entities. For example, you can link to an image as follows:

<!ELEMENT image EMPTY>
<!ATTLIST image src ENTITY #REQUIRED>
<!ENTITY chapterimage SYSTEM "chapimage.jpg" NDATA "jpg">

You can use the image as follows:

<image src="chapterimage">

The ENTITIES datatype allows multiple whitespace-separated references to entities, much like IDREFS and NMTOKENS allow multiple references to their datatypes.

The NOTATION keyword simply expects a notation that appears in the DTD with a <!NOTATION> declaration. Here, the player attribute of the <media> element can be either mpeg or jpeg:

<!NOTATION mpeg SYSTEM "mpegplay.exe">
<!NOTATION jpeg SYSTEM "netscape.exe">
<!ATTLIST media player
      NOTATION (mpeg | jpeg) #REQUIRED>

Note that you must enumerate each of the notations allowed in the attribute. For example, to dictate the possible values of the player attribute of the <media> element, use the following:

<!NOTATION mpeg SYSTEM "mpegplay.exe">
<!NOTATION jpeg SYSTEM "netscape.exe">
<!NOTATION mov SYSTEM "mplayer.exe">
<!NOTATION avi SYSTEM "mplayer.exe">
<!ATTLIST media player
      NOTATIONS (mpeg | jpeg | mov) #REQUIRED>

Note that according the rules of this DTD, the <media> element is not allowed to play AVI files. The NOTATION keyword is rarely used.

Finally, you can place all the ATTLIST entries for an element inside a single ATTLIST declaration, as long as you follow the rules of each datatype:

<!ATTLIST person
          name CDATA #REQUIRED
          number IDREF #REQUIRED
          company CDATA #FIXED "OReilly">

<?xml version="1.0" encoding="iso-8859-1"?> <![%book;[ <!ELEMENT text (chapter+)> ]]> <![%article;[ <!ELEMENT text (section+)> ]]> <!ELEMENT chapter (section+)> <!ELEMENT section (p+)> <!ELEMENT p (#PCDATA)>

10.4. Document Type Definitions

10.4.1. Element Declarations

10.4.2. ANY and PCDATA

10.4.2.1. Multiple sequences

10.4.2.2. Grouping and recurrence

10.4.2.3. Mixed content

10.4.2.4. Empty elements

10.4.3. Entities

10.4.3.1. General entities

10.4.3.2. Parameter entities

10.4.3.3. External entities

10.4.3.4. Unparsed entities

10.4.3.5. Notations

10.4.4. Attribute Declarations in the DTD

10.4.4.1. Attribute modifiers

10.4.4.2. Datatypes

10.4.5. Included and Ignored Sections

10.4.5.1. Internal subsets