16.2. Schema Basics
This section will construct, step by step, a simple
schema
document representing a typical address book entry, introducing
different features of the XML Schema language as needed. Example 16-1 shows a very simple well-formed XML document.
Example 16-1. addressdoc.xml
<?xml version="1.0"?>
<fullName>Scott Means</fullName>
Assuming that the fullName element can only
contain a simple string value, the schema for this document would
look like Example 16-2.
Example 16-2. address-schema.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="fullName" type="xs:string"/>
</xs:schema>
It is also common to associate the sample instance document
explicitly with the schema document. Since the
fullName element is not in any namespace, the
xsi:noNamespaceSchemaLocation attribute is used as
shown in Example 16-3.
Example 16-3. addressdoc.xml with schema reference
<?xml version="1.0"?>
<fullName xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="address-schema.xsd">Scott Means</fullName>
Validating
the simple document against
its schema requires a validating XML parser that supports schemas
such as the
open source
Xerces parser from the Apache XML Project (http://xml.apache.org/xerces-j/ ). This is
written in Java and includes a command-line program called
dom.DOMWriter that can be used to validate
addressdoc.xml like this:
% java dom.DOMWriter -V -S addressdoc.xml
Since the document is valid, DOMWriter will simply
echo the input document to standard output. An invalid document will
cause the parser to generate an error message. For instance, adding
b elements to the contents of the
fullName element violates the schema rules:
<?xml version="1.0"?>
<fullName xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="address-schema.xsd"
>Scott <b>Means</b></fullName>
If this document were validated with DOMWriter,
the following validity errors would be detected by Xerces:
[Error] addressdoc.xml:4:13: Element type "b" must be declared.
[Error] addressdoc.xml:4:31: Datatype error: In element 'fullName' : Can not
have element children within a simple type content.
16.2.1. Document Organization
Now that there is a basic schema and
a valid document from which to work, it is time to examine the
structure of a schema document and its contents. Every schema
document consists of a single root
xs:schema element. This
element contains declarations for all
elements and attributes that may appear in a valid instance document.
TIP:
The XML elements that make up an XML schema must belong to the XML
Schema namespace (http://www.w3.org/2001/XMLSchema), which is
frequently associated with the xs: prefix. For the
remainder of this chapter, all schema elements will be written using
the xs: prefix to indicate that they belong to the
Schema namespace.
Instance elements declared by top-level
elements in the schema (immediate child elements of the
xs:schema element) are considered global elements.
For example, the simple schema in Example 16-2
globally declares one element: fullName. According
to the rules of schema construction, any element that is declared
globally may appear as the root element of an instance document.
In this case, since only one element has been declared,
that shouldn't be a problem. But when building more
complex schemas, this side effect must be taken into consideration.
If more than one element is declared globally, a schema-valid
document may not contain the root element you expect.
Naming conflicts are another potential problem with multiple
global declarations. When writing schema declarations, it is an error
to declare two things of the same type at the same scope. For
instance, trying to declare two global elements called
fullName would generate an error. But declaring an
element and an attribute with the same name would not create a
conflict, because the two names are not used in the same way.
16.2.2. Annotations
Now that there is a working schema,
it's good practice to include some documentary
material about who authored it, what it was for, any copyright
restrictions, etc. Since an XML schema document is an XML document in
its own right, one simple option would be to use XML comments to
include documentary information.
The major drawback to using XML
comments is that parsers are not
obliged to keep comments intact when parsing XML documents, and
applications have to do a lot of work to negotiate their internal
structures. This increases the likelihood that, at some point,
important documentation will be lost during an otherwise harmless
transformation or editing procedure. Encoding documentation as markup
inline with the element and type declarations they refer to opens up
endless possibilities for automatic documentation generation.
To accommodate this extra information, most schema elements may
contain an optional xs:annotation element as their first child element.
The annotation element may then, in turn, contain any combination of
xs:documentation and xs:appinfo
elements, which are provided to contain extra human-readable and
machine-readable information, respectively.
16.2.4. Simple Types
Schemas support two different types of content:
simple and complex. Simple content equates with basic data types that
are found in most modern programming languages (strings, integers,
dates, times, etc.). Simple types cannot, by definition, contain
nested element content.
In the previous example, the type="xs:string"
attribute/value pair tells the schema processor that this element can
only contain simple content of the built-in type
xs:string. Table 16-1 lists a
representative sample of the built-in simple types that are defined
by the schema specification. See Chapter 21 for a
complete listing.
Table 16-1. Built-in simple schema types
Type
|
Description
|
anyURI
|
A Uniform Resource Identifier
|
base64Binary
|
Base64 content-encoded binary data
|
boolean
|
May contain either true or false, 0 or 1
|
byte
|
A signed byte quantity >= -128 and <= 127
|
dateTime
|
An absolute date and time value combination
|
duration
|
A relative amount of time, expressed in units of years, months, days,
hours, etc
|
ID, IDREF, IDREFS,
ENTITY, ENTITIES, NOTATION,
NMTOKEN, NMTOKENS
|
Same values as defined in the attribute declaration section of the
XML 1.0 recommendation
|
integer
|
Any positive or negative counting number
|
language
|
May contain same values as xml:lang attribute from
XML 1.0 recommendation
|
Name
|
An XML name
|
normalizedString
|
String with newline, tab, and carriage-return characters normalized
to spaces
|
string
|
Unicode string
|
token
|
Same as normalizedString with multiple spaces
collapsed and leading and trailing spaces removed
|
Since attribute values cannot contain elements, attributes must
always be declared with simple types. Also, an element that is
declared to have a simple type cannot have any attributes. This means
that if an attribute must be added to the fullName
element, some fairly significant changes to the element declaration
are required.
 |  |  | 16. XML Schemas |  | 16.3. Working with Namespaces |
Copyright © 2002 O'Reilly & Associates. All rights reserved.
|