Chapter 23. Structured Text: XML
XML, the eXtensible Markup Language,
has taken the programming world by storm over the last few years.
Like SGML, XML is a metalanguage, a language to describe markup
languages. On top of the XML 1.0 specification, the XML community (in
good part inside the World Wide Web Consortium, W3C) has standardized
other technologies, such as various schema languages, Namespaces,
XPath, XLink, XPointer, and XSLT.
Industry consortia in many fields have defined industry-specific
markup languages on top of XML, to facilitate data exchange among
applications in the various fields. Such industry standards let
applications exchange data even if the applications are coded in
different languages and deployed on different platforms by different
firms. XML, related technologies, and XML-based markup languages are
the basis of interapplication, cross-language, cross-platform data
interchange in modern
applications.
Python has excellent support for XML. The standard Python library
supplies the xml package, which lets you use
fundamental XML technology quite simply. The third-party package
PyXML (available at http://pyxml.sf.net) extends the standard
library's xml with validating
parsers, richer DOM implementations, and advanced technologies such
as XPath and XSLT. Downloading and installing PyXML upgrades
Python's own xml packages, so it
can be a good idea to do so even if you don't use
PyXML-specific features.
On top of PyXML, you can choose to install yet another freely
available third-party package, 4Suite (available at http://4suite.org). 4Suite provides yet more
XML parsers for special niches, advanced technologies such as XLink
and XPointer, and code supporting standards built on top of XML, such
as the Resource Description Framework (RDF).
As an alternative to
Python's built-in XML support, PyXML, and 4Suite,
you can try ReportLab's new pyRXP, a fast validating
XML parser based on Tobin's RXP. pyRXP is DOM-like
in that it constructs an in-memory representation of the whole XML
document you're parsing. However, pyRXP does not
construct a DOM-compliant tree, but rather a lightweight tree of
Python tuples to save memory and enhance speed. For more information
on pyRXP, see http://www.reportlab.com/xml/pyrxp.html.
For coverage of all aspects of XML and of how you can process XML
with Python, I recommend Python & XML, by
Christopher Jones and Fred Drake (O'Reilly). In this
chapter, I cover only the essentials of the standard
library's xml package, taking
some elementary knowledge of XML itself for granted.
|