Complex Types (XML in a Nutshell, 2nd Edition)

16.4. Complex Types

A schema assigns a type to each element and attribute it declares. In Example 16-5, the fullName element has a complex type. Elements with complex types may contain nested elements and have attributes. Only elements can have complex types. Attributes always have simple types.

Since the type is declared using an xs:complexType element embedded directly in the element declaration, it is also an anonymous type rather than a named type.

New types are defined using xs:complexType or xs:simpleType elements. If a new type is declared globally with a top-level element, it needs to be given a name so that it can be referenced from element and attribute declarations within the schema. If a type is declared inline (inside an element or attribute declaration), it does not need to be named. But since it has no name, it cannot be referenced by other element or attribute declarations. When building large and complex schemas, data types will need to be shared among multiple different elements. To facilitate this reuse, it is necessary to create named types.

To show how named types and complex content interact, let's expand the example schema. A new address element will contain the fullName element, and the person's name will be divided into a first- and last-name component. A typical instance document would look like Example 16-6.

Example 16-6. addressdoc.xml after adding address, first, and last elements

<?xml version="1.0"?>
<addr:address xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://namespaces.oreilly.com/xmlnut/address 
      address-schema.xsd"
    xmlns:addr="http://namespaces.oreilly.com/xmlnut/address"
    addr:language="en">
  <addr:fullName>
    <addr:first>Scott</addr:first>
    <addr:last>Means</addr:last>
  </addr:fullName>
</addr:address>

To accommodate this new format, fairly substantial structural changes to the schema are required, as shown in Example 16-7.

Example 16-7. address-schema.xsd to support address element

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
  targetNamespace="http://namespaces.oreilly.com/xmlnut/address"
  xmlns:addr="http://namespaces.oreilly.com/xmlnut/address"
  attributeFormDefault="qualified" elementFormDefault="qualified">
<xs:element name="address">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="fullName">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="first" type="addr:nameComponent"/>
            <xs:element name="last" type="addr:nameComponent"/>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
    </xs:sequence>
  <xs:attributeGroup ref="addr:nationality"/>
  </xs:complexType>
 </xs:element>
 
 <xs:complexType name="nameComponent">
  <xs:simpleContent>
    <xs:extension base="xs:string"/>
  </xs:simpleContent>
 </xs:complexType>
</xs:schema>

The first major difference between this schema and the previous version is that the root element name has been changed from fullName to address. The same result could have been accomplished by creating a new top-level element declaration for the new address element, but that would have opened a loophole allowing a valid instance document to contain only a fullName element and nothing else.

Within the address element declaration, a new anonymous complex type is declared. Unlike the old declaration, this complex type is declared to contain complex content using the xs:sequence element. The sequence element tells the schema processor that the contained list of elements must appear in the target document in the exact order they are given. In this case, the sequence contains only one element declaration.

The nested element declaration is for the fullName element, which then repeats the xs:complexType and xs:sequence declaration process. Within this nested sequence, two element declarations appear for the first and last elements.

These two element declarations, unlike all prior element declarations, explicitly reference a new complex type that's declared in the schema, the addr:nameComponent type. It is fully qualified to differentiate it from possible conflicts with built-in schema data types.

The nameComponent type is declared by the xs:complexType element immediately following the address element declaration. It is identified as a named type by the presence of the name attribute, but in every other way it is constructed the same way it would have been as an anonymous type.

16.4.1. Occurrence Constraints

One feature of schemas that should be welcome to DTD developers is the ability to explicitly set the minimum and maximum number of times an element may occur at a particular point in a document using minOccurs and maxOccurs attributes of the xs:element element. For example, this declaration adds an optional middle name to the fullName element:

<xs:element name="fullName">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="first" type="addr:nameComponent"/>
      <xs:element name="middle" type="addr:nameComponent"
          minOccurs="0"/>
      <xs:element name="last" type="addr:nameComponent"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

Notice that the element declaration for the middle element has a minOccurs value of 0. The default value for both minOccurs and maxOccurs is 1, if they are not provided explicitly. Therefore, setting minOccurs to 0 means that the middle element may appear 0 to 1 times. This is equivalent to using the ? operator in a DTD declaration. Another possible value for the maxOccurs attribute is unbounded, which indicates that the element in question may appear an unlimited number of times. This value is used to produce the same effect as the * and + operators in a DTD declaration.

16.4.2. Types of Element Content

So far you have seen elements that contain only character data and elements that contain only other elements. The next several sections cover each of the possible types of element content individually, from most restrictive to least restrictive:

Empty

Simple content

Mixed content
Any type