Understanding XML DTDs (HTML & XHTML: The Definitive Guide, 4th Edition)

To use a markup language defined with XML, you should be able to read and understand the elements and entities found in its XML DTD. But don't be put off: while XML DTDs are verbose, filled with obscure punctuation, and designed primarily for computer consumption, they are actually easy to understand once you get past all the syntactic sugar. Remember, your brain is better at languages than any computer is.

15.3.3. Entity Declarations

Let's define an entity with the <!ENTITY> tag in an XML DTD. Inside the tag, first supply the entity name and value, and then indicate whether it is a general or parameter entity:

<!ENTITY name value>
<!ENTITY % name value>

The first version creates a general entity; the second, because of the percent sign, creates a parameter entity.

For both entity types, the name is simply a sequence of characters beginning with a letter, colon, or underscore, and followed by any combination of letters, numbers, periods, hyphens, underscores, or colons. The only restriction is that names may not begin with the sequence "xml" (either upper or lowercase).

The entity value is either a character string within quotes (unlike HTML markup, you must use quotes even if it is a string of contiguous letters) or a reference to another document containing the value of the entity. For these external entity values, you'll find either the keyword SYSTEM, followed by the URL of the document containing the entity value, or the keyword PUBLIC, followed by the formal name of the document and its URL.

A few examples will make this clear. Here is a simple general entity declaration:

<!ENTITY fruit "kumquat or other similar citrus fruit">

In this declaration, the entity "&fruit;" within the document will be replaced with the phrase "kumquat or other similar citrus fruit" wherever it appears.

Similarly, here is a parameter entity declaration:

<!ENTITY % ContentType "CDATA">

Anywhere the reference %ContentType; appears in your DTD, it will be replaced with the word "CDATA". This is the typical way to use parameter entities: to create a more descriptive term for a generic parameter that will be used many times in a DTD.

Here is an external general entity declaration:

<!ENTITY boilerplate SYSTEM "http://server.com/boilerplate.txt">

It tells the XML processor to retrieve the contents of the file boilerplate.txt from server.com and use it as the value of the boilerplate entity. Anywhere you use &boilerplate; in your document, the contents of the file will be inserted as part of your document content.

Here is an external parameter entity declaration, lifted from the HTML DTD, that references a public external document:

<!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" "xhtml-lat1.ent">

It defines an entity named HTMLlat1 whose contents are to be taken from the public document identified as "-//W3C//ENTITIES Latin 1 for XHTML//EN". If the processor does not have a copy of this document available, it can use the URL "xhtml-lat1.ent " to find it. This particular public document is actually quite lengthy, containing all of the general entity declarations for the Latin 1 character encodings for HTML.[79]

[79]You can enjoy this document for yourself at http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent.

Accordingly, simply writing in the HTML DTD:

%HTMLlat1;

causes all of those general entities to be defined as part of the language.

A DTD author can use the PUBLIC and SYSTEM external values with general and parameter entity declarations. You should structure your external definitions to make your DTDs and documents easy to read and understand.

You'll recall that we began the section on entities with a mention of unparsed entities whose only purpose is to be used as values to certain attributes. You declare an unparsed entity by appending the keyword NDATA to an external general entity declaration, followed by the name of the unparsed entity. If we wanted to convert our general boilerplate entity to an unparsed general entity for use as an attribute value, we could say:

<!ENTITY boilerplate SYSTEM "http://server.com/boilerplate.txt" NDATA text>

With this declaration, attributes defined as type ENTITY (as described in Section 15.5.1, "Attribute Values") could use boilerplate as one of their values.

15.3. Understanding XML DTDs

15.3.1. Comments

15.3.2. Entities

15.3.3. Entity Declarations

15.3.4. Elements