Chapter 3. SAX
When dealing with XML programmatically, one of the first things you have to do is take an XML document and parse it. As the document is parsed, the data in the document becomes available to the application using the parser, and suddenly you are within an XML-aware application! If this sounds a little too simple to be true, it almost is. This chapter describes how an XML document is parsed, focusing on the events that occur within this process. These events are important, as they are all points where application-specific code can be inserted and data manipulation can occur.
As a vehicle for this chapter, I'm going to introduce the Simple API for XML (SAX). SAX is what makes insertion of this application-specific code into events possible. The interfaces provided in the SAX package will become an important part of any programmer's toolkit for handling XML. Even though the SAX classes are small and few in number, they provide a critical framework for Java and XML to operate within. Solid understanding of how they help in accessing XML data is critical to effectively leveraging XML in your Java programs. In later chapters, we'll add to this toolkit other Java and XML APIs like DOM, JDOM, JAXP, and data binding. But, enough fluff; it's time to talk SAX.
3.1. Getting Prepared
There are a few items that you must have before beginning to code. They are:
First, you must obtain an XML parser. Writing a parser for XML is a serious task, and there are several efforts going on to provide excellent XML parsers, especially in the open source arena. I am not going to detail the process of actually writing an XML parser here; rather, I will discuss the applications that wrap this parsing behavior, focusing on using existing tools to manipulate XML data. This results in better and faster programs, as neither you nor I spend time trying to reinvent what is already available. After selecting a parser, you must ensure that a copy of the SAX classes is on hand. These are easy to locate, and are key to Java code's ability to process XML. Finally, you need an XML document to parse. Then, on to the code!
3.1.1. Obtaining a Parser
The first step to coding Java that uses XML is locating and obtaining the parser you want to use. I briefly talked about this process in Chapter 1, "Introduction", and listed various XML parsers that could be used. To ensure that your parser works with all the examples in the book, you should verify your parser's compliance with the XML specification. Because of the variety of parsers available and the rapid pace of change within the XML community, all of the details about which parsers have what compliance levels are beyond the scope of this book. Consult the parser's vendor and visit the web sites previously given for this information.
In the spirit of the open source community, all of the examples in this book use the Apache Xerces parser. Freely available in binary and source form at http://xml.apache.org, this C- and Java-based parser is already one of the most widely contributed-to parsers available (not that hardcore Java developers like us care about C, though, right?). In addition, using an open source parser such as Xerces allows you to send questions or bug reports to the parser's authors, resulting in a better product, as well as helping you use the software quickly and correctly. To subscribe to the general list and request help on the Xerces parser, send a blank email to firstname.lastname@example.org. The members of this list can help if you have questions or problems with a parser not specifically covered in this book. Of course, the examples in this book all run normally on any parser that uses the SAX implementation covered here.
Once you have selected and downloaded an XML parser, make sure that your Java environment, whether it be an IDE (Integrated Development Environment) or a command line, has the XML parser classes in its classpath. This will be a basic requirement for all further examples.
NOTE: If you don't know how to deal with CLASSPATH issues, you may be in a bit over your head. However, assuming you are comfortable with your system CLASSPATH, set it to include your parser's jar file, as shown here:c: set CLASSPATH=.;c:\javaxml2\lib\xerces.jar;%CLASSPATH% c: echo %CLASSPATH% .;c:\javaxml2\lib\xerces.jar;c:\java\jdk1.3\lib\tools.jar
3.1.2. Getting the SAX Classes and Interfaces
Once you have your parser, you need to locate the SAX classes. These classes are almost always included with a parser when downloaded, and Xerces is no exception. If this is the case with your parser, you should be sure not to download the SAX classes explicitly, as your parser is probably packaged with the latest version of SAX that is supported by the parser. At this time, SAX 2.0 has long been final, so expect the examples detailed here (which are all using SAX 2) to work as shown, with no modifications.
If you are not sure whether you have the SAX classes, look at the jar file or class structure used by your parser. The SAX classes are packaged in the org.xml.sax structure. Ensure, at a minimum, that you see the class org.xml.sax.XMLReader. This will indicate that you are (almost certainly) using a parser with SAX 2 support, as the XMLReader class is core to SAX 2.
Finally, many parsers include documentation with a download, and this documentation may have the SAX API documentation packaged with it (Xerces being an example of this case).
3.1.3. Have an XML Document on Hand
You should also make sure that you have an XML document to parse. The output shown in the examples is based on parsing the XML document discussed in Chapter 2, "Nuts and Bolts". Save this file as contents.xml somewhere on your local hard drive. I highly recommend that you follow what I'm demonstrating by using this document; it contains various XML constructs for demonstration purposes. You can simply type the file in from the book, or you may download the XML file from the book's web site, http://www.newInstance.com.
Copyright © 2002 O'Reilly & Associates. All rights reserved.