home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Book HomePHP CookbookSearch this book

Chapter 12. XML

12.1. Introduction

Recently, XML has gained popularity as a data-exchange and message-passing format. As web services become more widespread, XML plays an even more important role in a developer's life. With the help of a few extensions, PHP lets you read and write XML for every occasion.

XML provides developers with a structured way to mark up data with tags arranged in a tree-like hierarchy. One perspective on XML is to treat it as CSV on steroids. You can use XML to store records broken into a series of fields. But, instead of merely separating each field with a comma, you can include a field name, type, and attributes alongside the data.

Another view of XML is as a document representation language. For instance, the PHP Cookbook was written using XML. The book is divided into chapters; each chapter into recipes; and each recipe into Problem, Solution, and Discussion sections. Within any individual section, we further subdivide the text into paragraphs, tables, figures, and examples. An article on a web page can similarly be divided into the page title and headline, the authors of the piece, the story itself, and any sidebars, related links, and additional content.

XML text looks similar to HTML. Both use tags bracketed by < and > for marking up text. But XML is both stricter and looser than HTML. It's stricter because all container tags must be properly closed. No opening elements are allowed without a corresponding closing tag. It's looser because you're not forced to use a set list of tags, such as <a>, <img>, and <h1>. Instead, you have the freedom to choose a series of tag names that best describe your data.

Other key differences between XML and HTML are case-sensitivity, attribute quoting, and whitespace. In HTML, <B> and <b> are the same bold tag; in XML, they're two different tags. In HTML, you can often omit quotation marks around attributes; XML, however, requires them. So, you must always write:

<element attribute="value">

Additionally, HTML parsers generally ignore whitespace, so a run of 20 consecutive spaces is treated the same as one space. XML parsers preserve whitespace, unless explicitly instructed otherwise. Because all elements must be closed, empty elements must end with />. For instance in HTML, the line break is <br>, while in XML, it's written as <br />.[9]

[9]This is why nl2br( ) outputs <br />; its output is XML-compatible.

There is another restriction on XML documents. Since XML documents can be parsed into a tree of elements, the outermost element is known as the root element . Just as a tree has only one trunk, an XML document must have exactly one root element. In the previous book example, this means chapters must be bundled inside a book tag. If you want to place multiple books inside a document, you need to package them inside a bookcase or another container. This limitation applies only to the document root. Again, just like trees can have multiple branches off of the trunk, it's legal to store multiple books inside a bookcase.

This chapter doesn't aim to teach you XML; for an introduction to XML, see Learning XML, by Erik T. Ray. A solid nuts-and-bolts guide to all aspects of XML is XML in a Nutshell, by Elliotte Rusty Harold and W. Scott Means. Both books are published by O'Reilly & Associates.

Now that we've covered the rules, here's an example: if you are a librarian and want to convert your card catalog to XML, start with this basic set of XML tags:

<book>
    <title>PHP Cookbook</title>
    <author>Sklar, David and Trachtenberg, Adam</author>
    <subject>PHP</subject>
</book>

From there, you can add new elements or modify existing ones. For example, <author> can be divided into first and last name, or you can allow for multiple records so two authors aren't placed in one field.

The first three recipes in this chapter cover writing and reading XML. Recipe 12.2 shows how to write XML without additional tools. To use the DOM XML extension to write XML in a standardized fashion, see Recipe 12.3. Reading XML using DOM is the topic of Recipe 12.4.

But XML isn't an end by itself. Once you've gathered all your XML, the real question is "What do you do with it?" With an event-based parser, as described in Recipe 12.5, you can make element tags trigger actions, such as storing data into easily manipulated structures or reformatting the text.

With XSLT, you can take a XSL stylesheet and turn XML into viewable output. By separating content from presentation, you can make one stylesheet for web browsers, another for PDAs, and a third for cell phones, all without changing the content itself. This is the subject of Recipe 12.6.

You can use a protocol such as XML-RPC or SOAP to exchange XML messages between yourself and a server, or to act as a server yourself. You can thus put your card catalog on the Internet and allow other programmers to query the catalog and retrieve book records in a format that's easy for them to parse and display in their applications. Another use would be to set up an RSS feed that gets updated whenever the library gets a new book in stock. XML-RPC clients and servers are the subjects of Recipe 12.7 and Recipe 12.8, respectively. Recipe 12.9 and Recipe 12.10 cover SOAP clients and servers. WDDX, a data exchange format that originated with the ColdFusion language, is the topic of Recipe 12.11. Reading RSS feeds, a popular XML-based headline syndication format, is covered in Recipe 12.12.

As with many bleeding-edge technologies, some of PHP's XML tools are not feature-complete and bug-free. However, XML is an area of active development in the PHP community; new features are added and bugs are fixed on a regular basis. As a result, many XML functions documented here are still experimental. Sometimes, all that means is that the function is 99% complete, but there may be a few small bugs lying around. Other times, it means that the name or the behavior of the function could be completely changed. If a function is in a highly unstable state, we mention it in the recipe.

We've documented the functions as they're currently planned to work in PHP 4.3. Because XML is such an important area, it made no sense to omit these recipes from the book. Also, we wanted to make sure that the latest functions are used in our examples. This can, however, lead to small problems if the function names and prototypes change. If you find that a recipe isn't working as you'd expect it to, please check the online PHP manual or the errata section of the catalog page for the PHP Cookbook, http://www.oreilly.com/catalog/phpckbk.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.