home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  

Book HomeJava and XML, 2nd EditionSearch this book

Chapter 10. Web Publishing Frameworks

This chapter begins our examination of specific Java and XML topics. I have covered the basics of using XML from Java, looking at the SAX, DOM, JDOM, and JAXP APIs to manipulate XML, and the fundamentals of using and creating XML itself. Now that you have a grasp on using XML from your code, I want to spend time on specific applications. The next six chapters cover the most significant applications of XML, and, in particular, how those applications are implemented in the Java space. While there are literally thousands of important applications of XML, the topics in these chapters are those continually in the spotlight, with the potential to significantly change the way traditional development processes occur.

The More Things Change, the More They Stay the Same

Readers of the first edition will find that much of the Cocoon discussion in this chapter is the same. Although I promised Cocoon 2 would be out by now and expected to be writing a chapter on it, things haven't progressed as quickly as expected. Stefano Mazzochi, the driving force behind Cocoon, finally got around to finishing school (good choice, Stefano!), and development on Cocoon 2 slowed as a result. Cocoon 1.x is still the current development path, so stick with it for now. I've updated the section on Cocoon 2 to reflect what is coming. Keep an eye out for more Cocoon-related books from O'Reilly in the months to come.

The first hot topic I look at is the XML application that has generated the most excitement in the XML and Java communities: the web publishing framework. Although I have continually emphasized that generating presentation from content is perhaps overhyped compared to the value of the portable data that XML provides, using XML for presentation styling is still very important. This importance increases when looking at web-based applications.

Virtually every major application I can find is either completely web-based or at least has a web frontend. At the same time, users are demanding more functionality, and marketing departments are demanding more flexibility in look and feel. The result has been the rise of the web artist; this new role is different from the webmaster in that little to no Perl, ASP, JavaScript, or other scripting language coding is part of the job description. The web artist's entire day is comprised of HTML and WML creation, modification, and development.[16] The rapid changes in business and market strategy can require a complete application or site overhaul as often as once a week, often forcing the web artist to spend days changing hundreds of HTML pages. While Cascading Style Sheets (CSS) have helped, the difficulty of maintaining consistency across these pages requires a huge amount of time. Even if this less-than-ideal situation were acceptable, no computer developer wants to spend his or her life making markup language changes to web pages.

[16]"HTML and WML" includes the tangential technologies used with the markup language. These complementary technologies, like Flash and Shockwave, are not trivial, so I'm by no means belittling these content authors.

With the advent of server-side Java, the problem has only grown. Servlet developers find themselves spending long hours modifying their out.println( ) statements to output HTML, and often glance hatefully at the marketing department when changes to a site's look require modifications to their code. The entire Java Server Pages (JSP) specification arguably stemmed from this situation; however, JSP is not a solution, as it only shifts the frustration to the content author, who constantly has to avoid making incidental changes to embedded Java code. In addition, JSP does not provide the clean separation between content and presentation it promises. A means to generate pure data content was called for, as well as a means to have that content uniformly styled either at predetermined times (static content generation) or dynamically at runtime (dynamic content generation).

Of course, you may be nodding your head at this familiar problem if you have ever done any web development, and hopefully your mind is wandering into the XSL and XSLT technology space. The problem is that an engine must exist to handle content generation, particularly in the dynamic sense. Having hundreds of XML documents on a site does no good if there is no mechanism to apply transformations on request. Add the need for servlets and other server-side components to output XML that should be consistently styled, and you have defined a small set of requirements for the web publishing framework. In this chapter, I take a look at this framework, how it allows you to avoid long hours of HTML coding, and how it helps you convert all of those "web artists" into XML and XSL gurus, allowing applications to change look and feel as often as desired.

A web publishing framework attempts to address these complicated issues. Just as a web server is responsible for responding to a URL request for a file, a web publishing framework is responsible for responding to a similar request; however, instead of responding with a file, it often will respond with a published version of a file. In this case, a published file refers to a file that may have been transformed with XSLT, massaged at an application level, or converted into another format such as a PDF. The requestor does not see the raw data that may underlie the published result, but also does not have to explicitly request that publication occur. Often, a URI base (such as http://yourHost.com/publish) signifies that a publishing engine that sits on top of the web server should handle requests. As you may suspect, the concept is much simpler than the actual implementation of a framework like this, and finding the correct framework for your needs is not a trivial task.

10.1. Selecting a Framework

You might expect to find a list of hundreds of possible solutions. As you've seen, the Java language offers an easy interface into XML through several APIs. Additionally, Java servlets offer a simple means of handling web requests and responses. However, the list of frameworks is small, and the list of good, stable ones is even smaller. One of the best resources for seeing what products are currently available is XML Software's list at http://xmlsoftware.com/publishing/. This list changes so frequently that it is not worth repeating here. Still, some important criteria for determining what framework is right for you are worth mentioning.

10.1.1. Stability

Don't be surprised if you (still!) have a hard time finding a product whose version tag is greater than 2.x. In fact, you may have to search diligently to even find a second-generation framework. While a higher version number is not a guarantee of stability, it often reflects the amount of time, effort, and review that a framework has undergone. The XML publishing system is such a new beast that the market has been flooded with 1.0 and 1.1 products that simply are not stable enough for practical use.

You can often ascertain the stability of a product by investigating other products from the same vendor. Often a vendor releases an entire suite of tools; if their other tools do not offer SAX 2.0 and DOM Level 2 support, or are all also 1.0 and 1.1 products, you might be wise to pass on the framework until it has matured and conformed to newer XML standards. Try to steer away from platform-specific technologies. If the framework is tied to a platform (such as Windows, or even a specific flavor of Unix), you aren't dealing with a pure Java solution. Remember that a publishing framework must serve clients on any platform; why use a product that can't run on any platform?

10.1.2. Integration with Other XML Tools and APIs

Once you know your framework is stable enough for your needs, make sure it supports a variety of XML parsers and processors. If your framework is tied to a specific parser or processor, you will be limited to one specific implementation of a technology. This is a bad thing. Although frameworks often integrate well with a particular parser vendor, determine if parsers can be interchanged. If you have a favorite processor (or one left to you from previous projects), make sure it can still be used.

Support for SAX and DOM is a must, and many frameworks now support JDOM and JAXP as well. Even if you have a favorite API, the more options you have, the better! Also, try to find a framework whose developers are monitoring the specifications of XML Schema, XLink, XPointer, and other XML vocabularies. This will indicate if you can expect to see revisions of the framework that add support for these XML specifications, an important indication of the framework's longevity. Don't be afraid to ask questions about how quickly new specifications are expected to be integrated into the product, and insist on a firm answer.

10.1.3. Production Presence

The last and perhaps most important question to answer when looking for a web publishing framework is whether it is used in production applications. If you aren't supplied with at least a few reference applications or sites that are using the framework, don't be surprised if there aren't any. Vendors (and developers, in the open source realm) should be happy and proud to let you know where to check out their frameworks in action. Hesitance in this area is a sign that you may be more of a pioneer with a product than you wish to be. For example, Apache Cocoon provides just such a list online, at http://xml.apache.org/cocoon/livesites.html.

10.1.4. Making the Decision

Once you have evaluated these criteria, you will probably have a clear choice. Very few frameworks can positively answer all the questions raised here, not to mention your application-specific concerns. In fact, as of July 2001, less than ten publishing frameworks exist that support the latest versions of SAX (Version 2.0), DOM (Level 2), and JAXP (Version 1.1) are in production use at even one application site, and have at least three significant revisions of code under their belt. These are not listed here because, honestly, in six months they may not exist, or may be radically changed. The world of web publishing frameworks is in such flux that trying to recommend four or five options and assuming they will be in existence months from now has a greater chance of misleading you than helping you.

However, one publishing framework has been consistently successful within the Java and XML community. When considering the open source community in particular, this framework is often the choice of Java developers. The Apache Cocoon project, founded by Stefano Mazzocchi, has been a solid framework since its inception. Developed while most of us were still trying to figure out what XML was, Cocoon is now entering its second generation as an XML publishing framework based completely in Java. It also is part of the Apache XML project, and has default support for Apache Xerces and Apache Xalan. It allows any conformant XML parser to be used, and is based on the immensely popular Java servlet architecture. In addition, there are several production sites using Apache Cocoon (in its 1.x form) that push the boundaries of traditional web application development yet still perform extremely well. For this reason, and again in keeping with the spirit of open source software, I use Apache Cocoon as the framework of choice in this chapter.

In previous chapters, the choice of XML parser and processor was fairly open; in other words, examples would work on different vendor implementations with only small modifications to code. However, the web publishing framework is not standardized, and each framework implements wildly different features and conventions. For this reason, the examples in this chapter using Apache Cocoon are not portable; however, the popularity of the concepts and design patterns used within Cocoon do merit an entire chapter. If you do not choose Cocoon, at least look over the examples. The concepts in web publishing are usable across any vendor implementation, even if the specifics of the code are not.

Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.