The Essentials (Java & XML, 2nd Edition)

Now you're ready to learn how to use Java and XML to their best. What do you need? I will address that subject, give you some basics, and then let you get after it.

1.3.2. A Parser

You will need an XML parser. One of the most important layers to any XML-aware application is the XML parser. This component handles the important task of taking a raw XML document as input and making sense of the document; it will ensure that the document is well-formed, and if a DTD or schema is referenced, it may be able to ensure that the document is valid. What results from an XML document being parsed is typically a data structure that can be manipulated and handled by other XML tools or Java APIs. I'm going to leave the detailed discussions of these APIs for later chapters. For now, just be aware that the parser is one of the core building blocks to using XML data.

Selecting an XML parser is not an easy task. There are no hard and fast rules, but two main criteria are typically used. The first is the speed of the parser. As XML documents are used more often and their complexity grows, the speed of an XML parser becomes extremely important to the overall performance of an application. The second factor is conformity to the XML specification. Because performance is often more of a priority than some of the obscure features in XML, some parsers may not conform to finer points of the XML specification in order to squeeze out additional speed. You must decide on the proper balance between these factors based on your application's needs. In addition, most XML parsers are validating, which means they offer the option to validate your XML with a DTD or XML Schema, but some are not. Make sure you use a validating parser if that capability is needed in your applications.

Here's a list of the most commonly used XML parsers. The list does not show whether a parser validates or not, as there are current efforts to add validation to several of the parsers that do not yet offer it. No overall ranking is suggested here, but there is a wealth of information on the web pages for each parser:

Apache Xerces: http://xml.apache.org
IBM XML4J: http://alphaworks.ibm.com/tech/xml4j
James Clark's XP: http://www.jclark.com/xml/xp
Oracle XML Parser: http://technet.oracle.com/tech/xml
Sun Microsystems Crimson: http://xml.apache.org/crimson
Tim Bray's Lark and Larval: http://www.textuality.com/Lark
The Mind Electric's Electric XML: http://www.themindelectric.com/products/xml/xml.html
Microsoft's MXSML Parser: http://msdn.microsoft.com/xml/default.asp

WARNING: I've included Microsoft's MSXML parser in this list in deference to their efforts to address numerous compliance issues in their latest versions. However, their parser still tends to be "doing its own thing" and is not guaranteed to work with the examples in this book because of that. Use it if you need to, but be willing to do a little extra work if you make this decision.

Throughout this book, I tend to use Apache Xerces because it is open source. This is a huge plus to me, so I'd recommend you try out Xerces if you don't already have a parser selected.

1.3. The Essentials

1.3.1. An Operating System and Java

1.3.2. A Parser

1.3.3. APIs

1.3.4. Application Software