Constraints (Java & XML, 2nd Edition)

2.2.2. XML Schema

XML Schema is a newly finalized candidate recommendation from the W3C. It seeks to improve upon DTDs by adding more typing and quite a few more constructs than DTDs, as well as following an XML format. I'm going to spend relatively little time here talking about schemas, because they are a "behind-the-scenes" detail for Java and XML. In the chapters where you'll be working with schemas (Chapter 14, "Content Syndication", for instance), I'll address specific points you need to be aware of. However, the specification for XML Schema is so enormous that it would take up an entire book of explanation on its own (see the book XML Schema on this CD-ROM). Example 2-4 shows the XML Schema constraining Example 2-1.

Example 2-4. XML Schema constraining Example 2-1

<?xml version="1.0" encoding="UTF-8"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
           xmlns="http://www.oreilly.com/javaxml2" 
           xmlns:ora="http://www.oreilly.com"
           targetNamespace="http://www.oreilly.com/javaxml2"
           elementFormDefault="qualified"
>
  <xs:import namespace="http://www.oreilly.com" 
             schemaLocation="contents-ora.xsd" />

  <xs:element name="book">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="title" />
        <xs:element ref="contents" />
        <xs:element ref="ora:copyright" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="title">
    <xs:complexType>
      <xs:simpleContent>
        <xs:restriction base="xs:string">
          <xs:attribute ref="ora:series" use="required" />
        </xs:restriction>
      </xs:simpleContent>
    </xs:complexType>
  </xs:element>

  <xs:element name="contents">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="chapter" maxOccurs="unbounded">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="topic" maxOccurs="unbounded">
                <xs:complexType>
                  <xs:attribute name="name" 
                                type="xs:string" 
                                use="required" />
                </xs:complexType>
              </xs:element>
            </xs:sequence>
            <xs:attribute name="title" type="xs:string" use="required"/>
            <xs:attribute name="number" type="xs:byte" use="required"/>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

</xs:schema>

In addition, you'll need the schema in Example 2-5, for reasons you will soon understand.

Example 2-5. Additional XML Schema for Example 2-1

<?xml version="1.0" encoding="UTF-8"?>

<xs:schema xmlns="http://www.oreilly.com" 
           xmlns:xs="http://www.w3.org/2001/XMLSchema" 
           targetNamespace="http://www.oreilly.com"
           attributeFormDefault="qualified" 
           elementFormDefault="qualified"
>
  <xs:attribute name="series" type="xs:string"/>
  <xs:element name="copyright" type="xs:string" />
</xs:schema>

Before diving into the specifics of these schemas, notice that various namespace declarations are made. First, the XML Schema namespace itself is attached to the xs prefix, allowing separation of XML Schema constructs from the elements and attributes being constrained. Next, the default namespace is attached to the namespace of the elements being defined; in Example 2-4 this is the Java and XML namespace, and in Example 2-5 it's the O'Reilly namespace. I've also assigned the targetNamespace attribute this same value. This attribute specifies to the schema the namespace of the elements and attributes being constrained. This is easy to forget, and can wreak a lot of havoc, so be careful to include it. At this point, namespaces are defined for the elements being constrained (the default namespace) and the constructs being used (the XML Schema namespace).

Last, I've specified the value of attributeFormDefault and elementFormDefault as "qualified." This indicates that I'll use fully qualified names for the elements and attributes, rather than just local names. I won't go into detail about this, but I highly recommend you use qualified names at all times. Trying to deal with multiple namespaces and unqualified names at the same time is a mess I wouldn't want to wander into.

2.2.2.1. Elements and attributes

Elements are defined with the element construct. You'll generally need to define your own data types by nesting a complexType tag within the element element, which defines the name of the element (through the name attribute). Take a look at this fragment of Example 2-4:

<xs:element name="book">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="title" />
      <xs:element ref="contents" />
      <xs:element ref="ora:copyright" />
    </xs:sequence>
  </xs:complexType>
</xs:element>

Here, I've specified that the book element has complex content. Within it there should be three elements: title, contents, and ora:copyright. By using the sequence construct, I've ensured that they appear in the specified order; and with no modifiers, an element must appear once and only once. For each of these other elements, I've used the ref keyword to reference another element definition. This points to the definitions for each of these elements in another part of the schema, and keeps things organized and easy to follow.

Later in the file, the title element is defined:

<xs:element name="title">
  <xs:complexType>
    <xs:simpleContent>
      <xs:restriction base="xs:string">
        <xs:attribute ref="ora:series" use="required" />
      </xs:restriction>
    </xs:simpleContent>
  </xs:complexType>
</xs:element>

This element is really just a simple XML Schema string type; however, I've added an attribute to it, so I must define a complexType. Since I'm extending an existing type, I use the simpleContent and restriction keywords (as nested elements) to define this type. simpleContent informs the schema that this is a basic type, and restriction, with the base of "xs:string", lets the schema know I want to allow just what the XML Schema string type allows, plus the additional attribute defined here (with the attribute keyword). For the attribute itself, I reference the type defined elsewhere, and specify that it must appear for this element (through use="required"). I realize that this paragraph is a mouthful, and not completely obvious; however, take your time and you'll get it all.

One other thing you'll notice is the use of minOccurs and maxOccurs attributes on the element element; these attributes allow an element to appear a specified number of times other than the default, which is once and only once. For example, specifying minOccurs="0" and maxOccurs="1" allows an element to appear once, or not at all. To allow an element to appear an unlimited number of times, you can use the value of "unbounded" for the maxOccurs attribute, as in Example 2-4.

2.2.2.2. Multiple namespaces

You'll notice that I defined two schemas, though, which may have you puzzled. For each namespace in a document, one schema must be defined. Additionally, you can't use the same external schema for both namespaces, and simply point both at that external schema. As a result, using the ora prefix and namespace requires an additional schema, which I called contents-ora.xsd. You'll also need to use the schemaLocation attribute I talked about earlier to reference this schema; however, don't add another attribute. Instead, you can append another namespace and schema-location pair to the end of the value of the attribute, as shown here:

<book xmlns="http://www.oreilly.com/javaxml2"
      xmlns:ora="http://www.oreilly.com"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.oreilly.com/javaxml2 XSD/contents.xsd 
                          http://www.oreilly.com XSD/contents-ora.xsd"
>

This essentially says for the namespace http://www.oreilly.com/javaxml2, look up definitions in the schema called contents.xsd in the XSD/ directory. For the http://www.oreilly.com namespace, use the contents-ora.xsd schema in the same directory. You'll then need to define the two schemas I showed you in Example 2-5 and Example 2-5. Finally, import the O'Reilly schema into the Java and XML one, since elements in the Java and XML schema refer to attributes in the O'Reilly one:

<xs:import namespace="http://www.oreilly.com" 
           schemaLocation="contents-ora.xsd" />

This import is fairly self-explanatory, so I won't dwell on it. You should realize that dealing with multiple namespaces is about the most complex thing you can do in schemas, and can easily trip you up. (It tripped me up, until Eric van der Vlist saved the day.) I also recommend a good XML Schema-capable editor. While I'm generally slow to recommend commercial products, in this case XMLSpy 4.0 (http://www.xmlspy.com) turned out to be wonderfully helpful.

I've barely scratched the surface of either DTDs or XML Schema, and there are even other constraint models not covered at all! For example, Relax (and Relax NG, which includes what used to be TREX) is gaining a lot of steam, as it's considered a lot easier and more lightweight than XML Schema. You can check out the activity online at http://www.oasis-open.org/committees/relax-ng/. No matter what technology you choose, though, you should be able to find something that helps you constrain your XML documents. With these constraints in place, validation and interoperability become a snap. Consider yourself educated on XML constraints, and get ready to move on to the next topic in this whirlwind tour: XML transformations.

2.2. Constraints

2.2.1. DTDs

Example 2-3. DTD for Example 2-1

2.2.1.1. Elements

Table 2-1. DTD recurrence modifiers

2.2.1.2. Attributes

2.2.1.3. Entities

2.2.2. XML Schema

Example 2-4. XML Schema constraining Example 2-1

Example 2-5. Additional XML Schema for Example 2-1

2.2.2.1. Elements and attributes

2.2.2.2. Multiple namespaces