12.1. Introduction
Recently,
XML has gained popularity as a data-exchange and message-passing
format. As web services become more widespread, XML plays an even
more important role in a developer's life. With the
help of a few extensions, PHP lets you read and write XML for every
occasion.
XML
provides developers with a structured way to mark up data with
tags
arranged in a tree-like hierarchy. One perspective on XML is to treat
it as CSV on steroids. You can use XML to store records broken into a
series of fields. But, instead of merely separating each field with a
comma, you can include a field name, type, and attributes alongside
the data.
Another view of XML is as a document representation language. For
instance, the PHP Cookbook was written using
XML. The book is divided into chapters; each chapter into recipes;
and each recipe into Problem, Solution, and Discussion sections.
Within any individual section, we further subdivide the text into
paragraphs, tables, figures, and examples. An article on a web page
can similarly be divided into the page title and headline, the
authors of the piece, the story itself, and any sidebars, related
links, and additional content.
XML text looks similar to HTML. Both use
tags bracketed by < and >
for marking up text. But
XML is both stricter and looser than HTML.
It's stricter because all container tags must be
properly closed. No opening elements are allowed without a
corresponding closing tag. It's looser because
you're not forced to use a set list of tags, such as
<a>, <img>, and
<h1>. Instead, you have the freedom to
choose a series of tag names that best describe your data.
Other key differences between XML and HTML are case-sensitivity,
attribute quoting, and whitespace. In HTML,
<B> and <b> are the
same bold tag; in XML, they're two different tags.
In HTML, you can often omit quotation marks around attributes; XML,
however, requires them. So, you must always write:
<element attribute="value">
Additionally,
HTML parsers generally ignore whitespace, so a run of 20 consecutive
spaces is treated the same as one space. XML parsers preserve
whitespace, unless explicitly
instructed otherwise. Because all elements must be closed, empty
elements must end with />. For instance in
HTML, the line break is <br>, while in XML,
it's written as <br
/>.[9]
There is another restriction on XML documents. Since XML documents
can be parsed into a tree of elements, the outermost element is known
as the root element
. Just as a tree has only one trunk,
an XML document must have exactly one root element. In the previous
book example, this means chapters must be bundled inside a book tag.
If you want to place multiple books inside a document, you need to
package them inside a bookcase or another container. This limitation
applies only to the document root. Again, just like trees can have
multiple branches off of the trunk, it's legal to
store multiple books inside a bookcase.
This chapter doesn't aim to teach you XML; for an
introduction to XML, see Learning XML, by Erik
T. Ray. A solid nuts-and-bolts guide to all aspects of XML is
XML in a Nutshell, by Elliotte Rusty Harold and
W. Scott Means. Both books are published by O'Reilly
& Associates.
Now that we've covered the rules,
here's an example: if you are a librarian and want
to convert your card catalog to XML, start with this basic set of XML
tags:
<book>
<title>PHP Cookbook</title>
<author>Sklar, David and Trachtenberg, Adam</author>
<subject>PHP</subject>
</book>
From there, you can add new elements or modify existing ones. For
example, <author> can be divided into first
and last name, or you can allow for multiple records so two authors
aren't placed in one field.
The first three recipes in this chapter cover writing and reading
XML. Recipe 12.2 shows how to write XML
without additional tools. To use the DOM XML extension to write XML
in a standardized fashion, see Recipe 12.3.
Reading XML using DOM is the topic of Recipe 12.4.
But XML isn't an end by itself. Once
you've gathered all your XML, the real question is
"What do you do with it?" With an
event-based parser, as described in Recipe 12.5, you can make element tags trigger actions,
such as storing data into easily manipulated structures or
reformatting the text.
With XSLT, you can take a XSL stylesheet and turn
XML
into viewable output. By separating content from presentation, you
can make one stylesheet for web browsers, another for PDAs, and a
third for cell phones, all without changing the content itself. This
is the subject of Recipe 12.6.
You can use a protocol such as
XML-RPC or SOAP to exchange XML
messages between yourself and a server, or to act as a server
yourself. You can thus put your card catalog on the Internet and
allow other programmers to query the catalog and retrieve book
records in a format that's easy for them to parse
and display in their applications. Another use would be to set up an
RSS feed that gets updated whenever the library gets a new book in
stock. XML-RPC clients and servers are the subjects of Recipe 12.7 and Recipe 12.8,
respectively. Recipe 12.9 and Recipe 12.10 cover SOAP clients and servers. WDDX, a data
exchange format that originated with the ColdFusion language, is the
topic of Recipe 12.11. Reading RSS feeds, a
popular XML-based headline syndication format, is covered in Recipe 12.12.
As with many bleeding-edge technologies, some of
PHP's XML tools are not feature-complete and
bug-free. However, XML is an area of active development in the PHP
community; new features are added and bugs are fixed on a regular
basis. As a result, many XML functions documented here are still
experimental. Sometimes, all that means is that the function is 99%
complete, but there may be a few small bugs lying around. Other
times, it means that the name or the behavior of the function could
be completely changed. If a function is in a highly unstable state,
we mention it in the recipe.
We've documented the functions as
they're currently planned to work in PHP 4.3.
Because XML is such an important area, it made no sense to omit these
recipes from the book. Also, we wanted to make sure that the latest
functions are used in our examples. This can, however, lead to small
problems if the function names and prototypes change. If you find
that a recipe isn't working as
you'd expect it to, please check the online PHP
manual or the errata section of the catalog page for the
PHP Cookbook,
http://www.oreilly.com/catalog/phpckbk.