Schema Basics (XML in a Nutshell, 2nd Edition)

Since the document is valid, DOMWriter will simply echo the input document to standard output. An invalid document will cause the parser to generate an error message. For instance, adding b elements to the contents of the fullName element violates the schema rules:

If this document were validated with DOMWriter, the following validity errors would be detected by Xerces:

16.2.1. Document Organization

Now that there is a basic schema and a valid document from which to work, it is time to examine the structure of a schema document and its contents. Every schema document consists of a single root xs:schema element. This element contains declarations for all elements and attributes that may appear in a valid instance document.

TIP: The XML elements that make up an XML schema must belong to the XML Schema namespace (http://www.w3.org/2001/XMLSchema), which is frequently associated with the xs: prefix. For the remainder of this chapter, all schema elements will be written using the xs: prefix to indicate that they belong to the Schema namespace.

Instance elements declared by top-level elements in the schema (immediate child elements of the xs:schema element) are considered global elements. For example, the simple schema in Example 16-2 globally declares one element: fullName. According to the rules of schema construction, any element that is declared globally may appear as the root element of an instance document.

In this case, since only one element has been declared, that shouldn't be a problem. But when building more complex schemas, this side effect must be taken into consideration. If more than one element is declared globally, a schema-valid document may not contain the root element you expect.

Naming conflicts are another potential problem with multiple global declarations. When writing schema declarations, it is an error to declare two things of the same type at the same scope. For instance, trying to declare two global elements called fullName would generate an error. But declaring an element and an attribute with the same name would not create a conflict, because the two names are not used in the same way.

16.2.2. Annotations

Now that there is a working schema, it's good practice to include some documentary material about who authored it, what it was for, any copyright restrictions, etc. Since an XML schema document is an XML document in its own right, one simple option would be to use XML comments to include documentary information.

The major drawback to using XML comments is that parsers are not obliged to keep comments intact when parsing XML documents, and applications have to do a lot of work to negotiate their internal structures. This increases the likelihood that, at some point, important documentation will be lost during an otherwise harmless transformation or editing procedure. Encoding documentation as markup inline with the element and type declarations they refer to opens up endless possibilities for automatic documentation generation.

To accommodate this extra information, most schema elements may contain an optional xs:annotation element as their first child element. The annotation element may then, in turn, contain any combination of xs:documentation and xs:appinfo elements, which are provided to contain extra human-readable and machine-readable information, respectively.

16.2.2.1. The xs:documentation element

As a concrete example, let's add some authorship and copyright information to the simple schema document, as shown in Example 16-4.

Example 16-4. address-schema.xsd with annotation

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  
 <xs:annotation>
  <xs:documentation xml:lang="en-us">
    Simple schema example from O'Reilly's
    <a outsideurl=/catalog/xmlnut">XML in a Nutshell.</a>
    Copyright 2002 O'Reilly &amp; Associates
  </xs:documentation>
 </xs:annotation>
  
 <xs:element name="fullName" type="xs:string"/>
  
</xs:schema>

The xs:documentation element permits an xml:lang attribute to identify the language of the brief message. This attribute can also be applied to the xs:schema element to set the default language for the entire document. For more information about using the xml:lang attribute, see Chapter 5 and Chapter 20.

Also, notice that the documentation element contains additional markup: an a element (à la HTML). The xs:documentation element is allowed to contain any well-formed XML, not just schema elements. The Section 16.8 later in this chapter explains how this can be done in your own documents.

16.2.2.2. The xs:appinfo element

In reality, there is little difference between the xs:documentation element and the xs:appinfo element. Either one can contain any combination of character data or markup the schema author wants to include. But the developers of the schema specification intended the xs:documentation element to contain human-readable content, while the xs:appinfo element would contain application-specific extension information related to a particular schema element.

For example, let's say that it is necessary to encode context-sensitive help text with each of the elements declared in a schema. This text might be used to generate tool-tips in a GUI or system prompts in a voicemail system. Either way, it would be very convenient to associate this information directly with the particular element in question using the xs:appinfo element, like this:

. . .
<xs:element name="fullName" type="xs:string">
  <xs:annotation>
    <xs:appinfo>
      <help-text>Enter the person's full name.</help-text>
    </xs:appinfo>
  </xs:annotation>
 </xs:element>
. . .

Although schemas allow very sophisticated and powerful rules to be expressed, they cannot possibly encompass every conceivable need that a schema developer might face. That is why it is important to remember that there is a facility that can be used to include your own application-specific information directly within the actual schema declarations.

TIP: Schematron is especially well-suited to use in annotations and is capable of checking a wide variety of conditions well beyond the bounds of XML Schema. For more information about Schematron, see http://www.ascc.net/xml/resource/schematron/schematron.html.

16.2.3. Element Declarations

XML documents are composed primarily of nested elements, and the xs:element element is one of the most often-used declarations in a typical schema. The simple example schema already includes a single global element declaration that tells the schema processor that instance documents must consist of a single element called fullName:

<xs:element name="fullName" type="xs:string">

This declaration uses two attributes to describe the element that can appear in the instance document: name and type. The name attribute is self-explanatory, but the type attribute requires some additional explanation.

16.2.4. Simple Types

Schemas support two different types of content: simple and complex. Simple content equates with basic data types that are found in most modern programming languages (strings, integers, dates, times, etc.). Simple types cannot, by definition, contain nested element content.

In the previous example, the type="xs:string" attribute/value pair tells the schema processor that this element can only contain simple content of the built-in type xs:string. Table 16-1 lists a representative sample of the built-in simple types that are defined by the schema specification. See Chapter 21 for a complete listing.

Table 16-1. Built-in simple schema types

Type	Description
anyURI	A Uniform Resource Identifier
base64Binary	Base64 content-encoded binary data
boolean	May contain either true or false, 0 or 1
byte	A signed byte quantity >= -128 and <= 127
dateTime	An absolute date and time value combination
duration	A relative amount of time, expressed in units of years, months, days, hours, etc
ID, IDREF, IDREFS, ENTITY, ENTITIES, NOTATION, NMTOKEN, NMTOKENS	Same values as defined in the attribute declaration section of the XML 1.0 recommendation
integer	Any positive or negative counting number
language	May contain same values as `xml:lang` attribute from XML 1.0 recommendation
Name	An XML name
normalizedString	String with newline, tab, and carriage-return characters normalized to spaces
string	Unicode string
token	Same as `normalizedString` with multiple spaces collapsed and leading and trailing spaces removed

Since attribute values cannot contain elements, attributes must always be declared with simple types. Also, an element that is declared to have a simple type cannot have any attributes. This means that if an attribute must be added to the fullName element, some fairly significant changes to the element declaration are required.

16.2. Schema Basics

Example 16-1. addressdoc.xml

Example 16-2. address-schema.xsd

Example 16-3. addressdoc.xml with schema reference