home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Book HomeJava and XML, 2nd EditionSearch this book

14.3. Push Versus Pull

So far, I have looked at building applications assuming that the application clients would always pull data and content. In other words, a user had to type a URL into a browser (in the case of the mytechbooks.com new book listings), or an application like the mytechbooks.com servlet had to make an HTTP request for XML data (in the case of the Foobar Public Library). While this is not a problem, it is not always the best way for a company like mytechbooks.com to sell books. Clients pulling data have to remember to visit sites they would buy items from, and often don't revisit those sites for days, weeks, or even months. While those clients may often purchase a large number of goods and services when they do remember, on average, those purchases do not result in as much revenue as if small purchases were made more frequently.

Realizing this trend, mytechbooks.com wants to be able to push data to its clients. Pushing data involves letting the client know (without any client action) that new items are available or that specials are being run. This in turn allows the client to make more frequent purchases without having to remember to visit a web page. However, pushing data to clients is difficult in a web medium, as the Internet does not behave as a thick client: it is harder to send pop-up messages or generate alerts for users. What mytechbooks.com has discovered, though, is the popularity of personalized "start pages" like Netscape's My Netscape and Yahoo's My Yahoo pages. In talking with Netscape, mytechbooks.com has been hearing about a technology called Rich Site Summary (RSS), and thinks that may be the answer to its need to push data out to clients.

14.3.1. Rich Site Summary

Rich Site Summary (RSS) is a particular flavor of XML. It has its own DTD, and defines what is called a channel. A channel is a way to represent data about a specific subject, and provides for a title and description of the channel, an image or logo, and then several items within the channel. Each item, then, is something of particular interest about the channel, or a product or service available. Because the allowed elements of an item are fairly generic (title, description, hyperlink), almost anything can be represented as an item of a channel. An RSS channel is not intended to provide a complete site's content, but rather a short blurb about a company or service, suitable for display in a portal-style framework, or as a sidebar on a web site. In fact, the different "widgets" at Netscape's Netcenter are all RSS channels, and Netscape allows the creation of new RSS channels that can be registered with Netcenter. Netscape also has a built-in system for displaying RSS channels in an HTML format, which of course fits into its Netcenter start pages.

At this point, you may be a little concerned that RSS is to Netscape as Microsoft's XML parser is to Microsoft: difficult to integrate with other tools or vendors. Although originally developed by Netscape specifically for Netcenter, the XML structure of RSS has made it usable by any application that can read a DTD. In fact, many portal-style web sites and applications are beginning to use RSS, such as the Apache Jetspeed project (http://jakarta.apache.org/jetspeed), an open source Enterprise Information Portal system. Jetspeed takes the same RSS format that Netscape uses, and renders it in a completely different manner. Because of the concise grammar of RSS, this is easily done.

As many users have start pages, or homepages, or similar places on the Web that they frequent, mytechbooks.com would like to create an RSS channel that provides new book listings, and then allows interested clients to jump straight to buying an item that catches their eye. This is an effective means to push data, as products like Netcenter will automatically update RSS channel content as often as the user desires.

14.3.2. Creating an XML RSS Document

The first thing you need to do to use RSS is create an RSS file. This is almost too simple to be believed: other than referencing the correct DTD and following that DTD, there is nothing at all complicated about creating an RSS document. Example 14-6 shows a sample RSS file that mytechbooks.com has modeled.

Example 14-6. Sample RSS document for mytechbooks.com

<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns="http://purl.org/rss/1.0/"
>
 <channel>
  <title>mytechbooks.com New Listings</title>
  <link>http://www.newInstance.com/javaxml2/techbooks</link>
  <description>
   Your online source for technical material, computers, 
   and computing books!
  </description>

  <image rdf:resource="http://newInstance.com/javaxml2/logo.gif" />

  <items>
   <rdf:Seq>
    <rdf:li resource="http://www.newInstance.com/javaxml2/techbooks" />
   </rdf:Seq>
  </items>
 </channel>

  <image rdf:about="http://newInstance.com/javaxml2/logo.gif">
   <title>mytechbooks.com</title>
   <url>http://newInstance.com/javaxml2/logo.gif</url>
   <link>http://newInstance.com/javaxml2/techbooks</link>
  </image>

  <item rdf:about="http://www.newInstance.com/javaxml2/techbooks">
   <title>Java Servlet Programming</title>
   <link>
    http://newInstance.com/javaxml2/techbooks/buy.xsp?isbn=156592391X
   </link>
   <description>
    This book is a superb introduction to Java servlets
    and their various communications mechanisms.
   </description>
  </item>
</rdf:RDF>

The root element must be RDF, in the RDF namespace, as shown in the example. Within the root element, one single channel element must appear. This has elements that describe the channel (title, link, and description), an optional image that can be associated with the channel (as well as information about that image), and then as many as 15 item elements,[23] each detailing one item related to the channel. Each item has a title, link, and description element, all of which are self-explanatory. An optional text box and button to submit the information in the book can be added as well, although these are not included in the example. For complete details of allowed elements and attributes, visit the RSS 1.0 specification online at http://groups.yahoo.com/group/rss-dev/files/specification.html.

[23] This isn't a limit set by RSS 1.0, but is used for backwards compatibility with RSS 0.9 and 0.91.

NOTE: As in previous examples, actual RSS channel documents should avoid having whitespace within the link and url elements, but rather have all information on a single line. Again, the formatting in the example does not reflect this due to printing and sizing constraints.

There is one somewhat tricky thing to watch out for, though. You'll notice that the item element (or elements) is actually not nested within the channel element at all. To create a link between items in the document and the channel, you'll want to use some RDF (the Resource Description Framework, which RSS is a descendant of) constructs:

  <items>
   <rdf:Seq>
    <rdf:li resource="http://www.newInstance.com/javaxml/techbooks" />
   </rdf:Seq>
  </items>

Here, the items element is nested within the channel element. Then, the li construct, in the RDF-defined namespace, is assigned a URI through the resource attribute. In each item you want associated with this channel, supply the about attribute (again in the RDF namespace) and assign it the same URI you used in the channel's resource descriptor:

  <item rdf:about="http://www.newInstance.com/javaxml/techbooks">
    <!-- Item content -->
  </item>

For each item with this URI, an association can be made between that item and the channel with the same URI. In other words, you've just built a link between the channel in the RSS file and the items. The same approach applies for linking a channel to an image; you use the image element in the channel element, specifying the image URL as the value of the rdf:resource attribute. You should then define an image element, not within the channel element, supplying a URL, description, and link. Finally, use the rdf:about attribute (as in the item element) to specify the same URL as provided in the channel's image element. Did you follow all of that? This is all quite a bit different from RSS 0.9 and 0.91 (covered in the first edition of this book), so you'll need to be careful not to get things mixed up between the older specification and this newer one.

It is simple enough to create RSS files programmatically; the procedure is similar to how you generated the HTML for the mytechbooks.com web site. Half of the RSS file (the information about the channel as well as the image information) is static content; only the item elements must be generated dynamically. However, just as you were getting ready to open up vi and start creating another XSL stylesheet, another requirement was dropped into your lap: the machine that will house the RSS channel is a different server than that used in our last example, and has only very outdated versions of the Apache Xalan libraries available. Because of some of the high-availability applications that also run on that machine, such as the billing system, mytechbooks.com does not want to update those libraries until change control can be stepped through, a weeklong process. However, mytechbooks.com does have newer versions of the Xerces libraries available (as XML parsing is used in the billing system), so Java APIs for handling XML are available.[24] In this example, I use JDOM to convert the XML from the Foobar Public Library into an RSS channel format. Example 14-7 does just this.

[24] Yes, this is a bit of a silly case, and perhaps not so likely to really occur. However, it does afford me the opportunity to look at another alternative for creating XML programmatically. Don't sneer too much at the absurdity of the example; all of the examples in this book, including the silly ones, stem from actual experiences consulting for real-world companies. Laughing at this scenario might mean your next project has the same silly requirements!

Example 14-7. Java servlet to convert new book listings into an RSS channel document

package com.techbooks;

import java.io.FileInputStream;
import java.io.InputStream;
import java.io.IOException;
import java.io.PrintWriter;
import java.net.URL;
import java.util.Iterator;
import java.util.List;
import javax.servlet.*;
import javax.servlet.http.*;

// JDOM
import org.jdom.Document;
import org.jdom.Element;
import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;

public class GetRSSChannelServlet extends HttpServlet {

    /** Host to connect to for books list */
    private static final String hostname = "newInstance.com";
    /** Port number to connect to for books list */
    private static final int portNumber = 80;
    /** File to request (URI path) for books list */
    private static final String file = "/cgi/supplyBooks.pl";

    public void service(HttpServletRequest req, HttpServletResponse res) 
        throws ServletException, IOException {            
            
        res.setContentType("text/plain");
        PrintWriter out = res.getWriter();
        
        // Connect and get XML listing of books
        URL getBooksURL = new URL("http", hostname, portNumber, file);
        InputStream in = getBooksURL.openStream();

        try {
            // Request SAX Implementation and use default parser
            SAXBuilder builder = new SAXBuilder();

            // Create the document
            Document doc = builder.build(in);
            
            // Output XML
            out.println(generateRSSContent(doc));
            
        } catch (JDOMException e) {        
            out.println("Error: " + e.getMessage());
        } finally {
            out.close();
        }
    }   
    
    /**
     * <p>
     * This will generate an RSS XML document using the supplied 
     *   JDOM <code>Document</code>.
     * </p.
     *
     * @param doc <code>Document</code> to use for input.
     * @return <code>String</code> - RSS file to output.
     * @throws <code>JDOMException</code> when errors occur.
     */
    private String generateRSSContent(Document doc) throws JDOMException {
        StringBuffer rss = new StringBuffer();
        
        rss.append("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n")
           .append("<rdf:RDF ")
           .append("xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"\n")
           .append("         xmlns=\"http://purl.org/rss/1.0/\"\n")
           .append(">\n")
           .append(" <channel>\n")
           .append("  <title>mytechbooks.com New Listings</title>\n")
           .append("  <link>http://www.newInstance.com/javaxml2/techbooks")
           .append("</link>\n")
           .append("  <description>\n")
           .append("   Your online source for technical material, computers, \n")
           .append("   and computing books!\n")
           .append("  </description>\n\n")
           .append("  <image ")
           .append("rdf:resource=\"http://newInstance.com/javaxml2/logo.gif\"")
           .append(" />\n\n")
           .append("  <items>\n")
           .append("   <rdf:Seq>\n")
           .append("    <rdf:li ")
           .append("resource=\"http://www.newInstance.com/javaxml2/techbooks\"")
           .append(" />\n")
           .append("   </rdf:Seq>\n")
           .append("  </items>\n")
           .append(" </channel>\n\n")
           .append("  <image ")
           .append("rdf:about=\"http://newInstance.com/javaxml2/logo.gif\">\n")
           .append("   <title>mytechbooks.com</title>\n")
           .append("   <url>http://newInstance.com/javaxml2/logo.gif</url>\n")
           .append("   <link>http://newInstance.com/javaxml2/techbooks</link>\n")
           .append("  </image>\n\n");
           
        // Add an item for each new title with Computers as subject
        List books = doc.getRootElement().getChildren("book");
        for (Iterator i = books.iterator(); i.hasNext(); ) {
            Element book = (Element)i.next();
            if (book.getAttribute("subject")
                    .getValue()
                     .equals("Computers")) {
                // Output an item
                rss.append("<item rdf:about=\"http://www.newInstance.com/")
                   .append("javaxml2/techbooks\">\n")
                    // Add title
                   .append(" <title>")
                   .append(book.getChild("title").getContent())
                   .append("</title>\n")
                    // Add link to buy book
                   .append(" <link>")
                   .append("http://newInstance.com/javaxml2")
                   .append("/techbooks/buy.xsp?isbn=")
                   .append(book.getChild("saleDetails")
                               .getChild("isbn")
                               .getContent())
                   .append("</link>\n")
                   .append(" <description>")
                    // Add description
                   .append(book.getChild("description").getContent())
                   .append("</description>\n")                       
                   .append("</item>\n");
                        
            }
        }          
         
        rss. append("</rdf:RDF>");
        
        return rss.toString();
    }
}

By this time, nothing in this code should be the least bit surprising to you; I've imported the JDOM and I/O classes needed, and accessed the Foobar Public Library application as in the ListBooksServlet. The resulting InputStream is used to create a JDOM Document, with the default parser (Apache Xerces) and the JDOM builder based on SAX doing the work.

Then, the JDOM Document is handed off to the generateRSSContentMethod() , which prints out all of the static content for the RSS channel. This method then obtains the book elements within the XML from the library and iterates through them, ignoring those without a subject attribute equal to "Computers".

NOTE: Again, I've done some rather different things simply for illustrative purposes. For example, this code directly outputs XML; you could just as easily create a JDOM tree and output it using XMLOutputter. Of course, you could also use DOM for the entire servlet. All these are viable and perfectly legitimate options.

Finally, each element that makes it through the comparison is added to the RSS channel. Nothing very exciting here, right? Figure 14-5 shows a sample output from accessing this servlet, saved as GetRSSChannelServlet.java, through a web browser.

Figure 14-5

Figure 14-5. RSS channel generated by the GetRSSChannelServlet

With this RSS channel ready for use, mytechbooks.com has made its content available by any service provider that supports RSS! To get the ball rolling on allowing clients to use its channel, mytechbooks.com would like to ensure its RSS document is valid, and see a sample HTML rendering of it (as would you, I imagine).

14.3.3. Taking a Test Drive

At this point, let's see this thing in action. Point your browser at http://www.redland.opensource.ac.uk/rss. This site has a nice, online test tool called the Redland RSS viewer that will take your RSS channel and validate it, as well as render it in HTML. You'll need to ensure that the RSS feed is available online somewhere, such as through the servlet just discussed. Enter in the URL for the servlet, or RSS feed, and then select "Yes" for the "Format output as a box" option. This instructs the viewer to render your channel as an HTML box, much like it could be seen on Netscape's Netcenter or on http://www.oreilly.com, which has several RSS feeds running. The output from the feed we just created is shown in Figure 14-6.

Figure 14-6

Figure 14-6. RSS formatted in HTM

You can also select several other RSS feeds from the viewer, and see how they would look formatted in HTML as well. The Meerkat channel is particularly interesting, as it contains almost all of the RSS options that are currently available for use. Additionally, if you have any errors in your RSS, this viewer will let you know what they are, which is helpful in debugging problems before putting your RSS channel into production.

I'm not including code that parses and formats the RSS in this chapter; in addition to it being a piece of cake for you by now, each site will need to provide different formatting for RSS feeds. For some fairly diverse views of RSS channels, you should check out http://www.servlets.com (down at the bottom right), http://www.oreilly.com, and http://www.xml.com, all of which have some pretty different formatting going on. In reading in an RSS channel, you'll probably want to treat it like XML, and use SAX, DOM, or JDOM to read the data in and format it however you need. In other words, there's nothing that requires you to treat an RSS feed any differently from any other XML document; you just know what the formatting will look like ahead of time. With that knowledge, you're ready to use RSS feeds in your own web sites.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.