Performance Techniques (Java and XSLT)

9.3. Performance Techniques

The actual XSLT transformation is not always the root of performance problems. XML parsers have a significant impact on performance, along with many other factors such as database access time, time spent processing business logic, and network latency.

Obsessing over performance can be a dangerous trap to fall into. Focusing too heavily on optimization techniques often results in code that is difficult or impossible to understand and maintain. From a strictly technical viewpoint, the fastest technology sounds great. From a business viewpoint, time to market and maintainability are often far more important than runtime performance metrics. An application that meets performance requirements and is easy to maintain over the years makes better business sense than a highly tuned, cryptic application that runs fast but cannot be modified because the original author quit the company and nobody can figure out the code.

Servletcontainer

XSLTprocessor

Database

Threads

Average responsetime (ms)

Tomcat 3.2.1

Xalan 2.0

Access 2000

130

320

760

1600

Tomcat 4.0

SAXON 6.2.2

MySQL

150

320

610

C:\>java -Xrunhprof:help Hprof usage: -Xrunhprof[:help]|[<option>=<value>, ...] Option Name and Value Description Default --------------------- ----------- ------- heap=dump|sites|all heap profiling all cpu=samples|times|old CPU usage off monitor=y|n monitor contention n format=a|b ascii or binary output a file=<file> write data to file java.hprof(.txt for ascii) net=<host>:<port> send data over a socket write to file depth=<size> stack trace depth 4 cutoff=<value> output cutoff point 0.0001 lineno=y|n line number in traces? y thread=y|n thread in traces? n doe=y|n dump on exit? y Example: java -Xrunhprof:cpu=samples,file=log.txt,depth=3 FooClass

rank self accum count trace method 1 13.70% 13.70% 20 31 java.lang.ClassLoader.defineClass0 2 7.53% 21.23% 11 19 java.util.zip.ZipFile.getEntry 3 5.48% 26.71% 8 35 java.io.Win32FileSystem.getBooleanAttributes 4 4.11% 30.82% 6 26 java.util.zip.ZipFile.read 5 3.42% 34.25% 5 92 java.util.zip.Inflater.inflateBytes 6 3.42% 37.67% 5 6 java.lang.ClassLoader.findBootstrapClass 7 2.74% 40.41% 4 22 java.util.zip.ZipFile.getEntry 8 2.74% 43.15% 4 143 org.apache.xalan.templates StylesheetRootnewTransformer 9 2.74% 45.89% 4 14 java.util.zip.ZipFile.open 10 1.37% 47.26% 2 4 java.net.URLClassLoader.defineClass

TRACE 31: java.lang.ClassLoader.defineClass0(ClassLoader.java:Native method) java.lang.ClassLoader.defineClass(ClassLoader.java:486) java.security.SecureClassLoader.defineClass(SecureClassLoader.java:111) java.net.URLClassLoader.defineClass(URLClassLoader.java:248) java.net.URLClassLoader.access$100(URLClassLoader.java:56) java.net.URLClassLoader$1.run(URLClassLoader.java:195)

9.3.3. Using XSLT Processors Effectively

Measuring performance is the first step towards making Java and XSLT applications faster. Once the bottlenecks have been located, it is time to fix the problems.

9.3.3.1. Stylesheet caching

As mentioned several times in this book, caching XSLT stylesheets is an essential performance technique. JAXP includes the Templates interface for this purpose, and we already saw the implementation of a stylesheet cache in Chapter 5, "XSLT Processing with Java". Table 9-4 illustrates the performance gains seen when using the Templates interface to transform a small XML file repeatedly. For this test, the same transformation is performed 100 times using various programming techniques.

Table 9-4. Benefits of caching

Processor	No templates	Templates	Templates and cached XML
Xalan 2.0	71.8ms	45.9ms	39.2ms
SAXON 6.2.2	52.7ms	37.3ms	34.2ms

In the "No templates" column, the Templates interface was not used for transformations. As you can see, this resulted in the slowest performance because the stylesheet had to be parsed from a file with each transformation. In the next column, a Templates instance was created once and reused for each transformation. As you can see, the performance increased substantially.

In the final column of the table, the XML data was read into memory and cached as a DOM Document. Instead of reparsing the XML file with each request, the same DOM tree was cached and reused for each of the transformations. This yielded a slight performance gain because the XML file did not have to be read from the file system with each transformation.

Although these results seem to imply that SAXON is faster than Xalan, this may be a faulty assumption. Performance can vary greatly depending on how large the input files are and which features of XSLT are used. It is wise to test performance with your application before choosing one set of tools over another.

9.3.3.2. Result caching

When the XML is highly dynamic and changes with each request, XSLT caching may be the best one can hope for. But when the same data is requested repeatedly, such as on the home page for your company, it makes sense to cache the result of the transformation rather than the XSLT stylesheet. This way, the transformation is performed only when the XML or XSLT actually change.

Example 9-15 presents a utility class that caches the results of XSLT transformations. In this implementation, both the XML data and XSLT stylesheet must come from static files. If the timestamp of either file changes, the transformation is performed again. Otherwise, a cached copy of the transformation result is returned to the caller.

Example 9-15. ResultCache.java

package com.oreilly.javaxslt.util;

import java.io.*;
import java.util.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;

/**
 * A utility class that caches XSLT transformation results in memory.
 *
 * @author Eric M. Burke
 */
public class ResultCache {
    private static Map cache = new HashMap( );

    /**
     * Flush all results from memory, emptying the cache.
     */
    public static synchronized void flushAll( ) {
        cache.clear( );
    }

    /**
     * Perform a single XSLT transformation.
     */
    public static synchronized String transform(String xmlFileName,
            String xsltFileName) throws TransformerException {

        MapKey key = new MapKey(xmlFileName, xsltFileName);

        File xmlFile = new File(xmlFileName);
        File xsltFile = new File(xsltFileName);

        MapValue value = (MapValue) cache.get(key);
        if (value == null || value.isDirty(xmlFile, xsltFile)) {
            // this step performs the transformation
            value = new MapValue(xmlFile, xsltFile);
            cache.put(key, value);
        }

        return value.result;
    }

    // prevent instantiation of this class
    private ResultCache( ) {
    }

    /////////////////////////////////////////////////////////////////////
    // a helper class that represents a key in the cache map
    /////////////////////////////////////////////////////////////////////
    static class MapKey {
        String xmlFileName;
        String xsltFileName;

        MapKey(String xmlFileName, String xsltFileName) {
            this.xmlFileName = xmlFileName;
            this.xsltFileName = xsltFileName;
        }

        public boolean equals(Object obj) {
            if (obj instanceof MapKey) {
                MapKey rhs = (MapKey) obj;
                return this.xmlFileName.equals(rhs.xmlFileName)
                        && this.xsltFileName.equals(rhs.xsltFileName);
            }
            return false;
        }

        public int hashCode( ) {
            return this.xmlFileName.hashCode() ^ this.xsltFileName.hashCode( );
        }
    }

    /////////////////////////////////////////////////////////////////////
    // a helper class that represents a value in the cache map
    /////////////////////////////////////////////////////////////////////
    static class MapValue {
        long xmlLastModified;  // when the XML file was modified
        long xsltLastModified;  // when the XSLT file was modified
        String result;

        MapValue(File xmlFile, File xsltFile) throws TransformerException {
            this.xmlLastModified = xmlFile.lastModified( );
            this.xsltLastModified = xsltFile.lastModified( );

            TransformerFactory transFact = TransformerFactory.newInstance( );
            Transformer trans = transFact.newTransformer(
                    new StreamSource(xsltFile));

            StringWriter sw = new StringWriter( );
            trans.transform(new StreamSource(xmlFile), new StreamResult(sw));

            this.result = sw.toString( );
        }

        /**
         * @return true if either the XML or XSLT file has been
         * modified more recently than this cache entry.
         */
        boolean isDirty(File xmlFile, File xsltFile) {
            return this.xmlLastModified < xmlFile.lastModified( )
                    || this.xsltLastModified < xsltFile.lastModified( );
        }
    }
}

The key to this class is its transform( ) method. This method takes filenames of an XML file and XSLT stylesheet as arguments and returns the transformation result as a String. If any error occurs, a TransformerException is thrown:

public static synchronized String transform(String xmlFileName,
        String xsltFileName) throws TransformerException {

The cache is implemented using a java.util.Map data structure, which requires key/value pairs of data. The MapKey helper class is used as the key:

MapKey key = new MapKey(xmlFileName, xsltFileName);

File xmlFile = new File(xmlFileName);
File xsltFile = new File(xsltFileName);

Next, the value is retrieved from the cache. Another helper class, MapValue, keeps track of the transformation result and when each file was last modified. If this is the first request, the value will be null. Otherwise, the isDirty( ) method determines if either file has been updated:

    MapValue value = (MapValue) cache.get(key);
    if (value == null || value.isDirty(xmlFile, xsltFile)) {
        // this step performs the transformation
        value = new MapValue(xmlFile, xsltFile);
        cache.put(key, value);
    }

    return value.result;
}

As the comment indicates, constructing a new MapValue causes the XSLT transformation to occur. Unless exceptions are thrown, the result of the transformation is returned to the caller.

When compared to the results shown earlier in Table 9-4, this approach to caching is much faster. In fact, the average response time is less than a millisecond once the initial transformation has been performed.

This approach is quite easy to implement for applications based on a collection of static files but is significantly more difficult for database-driven applications. Since more dynamic applications may generate new XML with each invocation, a generic utility class cannot simply cache the result of the transformation. Stale data is the biggest problem with dynamic caching. When the result of an XSLT transformation is stored in memory and the underlying database changes, the cache must be refreshed for users to see the correct data.

Let's suppose that we want to add result caching to the discussion forum application presented in Chapter 7, "Discussion Forum". Since messages cannot be modified once they have been posted, this should be fairly easy to implement for the View Message page. One easy approach is to keep a cache of a fixed number of messages. Whenever a user views a message, the generated web page is added to the cache. If the cache exceeds a specified number of messages, the oldest entries can be flushed.

For more dynamic pages, such as the Month View page, the database must be queried to determine when the most recent message was posted for that particular message board. If the most recently posted message is newer than the cached web page, the transformation must be performed again using the updated data. As you might guess, this sort of caching must be done on a case-by-case basis, because it is very tightly coupled to the database design.

WARNING: Web applications relying on URL rewriting for session tracking may not be able to cache transformation results. This is because, as outlined in Chapter 8, "Additional Techniques", every URL must be dynamically encoded with the jsessionid when cookies are disabled.

As with any other type of optimization, the benefits of caching must be carefully weighed against the costs of added complexity. The best approach is to analyze log files to see which pages are requested most often and to focus optimization efforts there.

// somewhere in a servlet... if (employee.isMarried( )) { // request the Spouse, which will make a call to the EJB tier // unless the spouse was requested earlier and is cached Person spouse = employee.getSpouse( ); // generate XML for the spouse... } else { // simply generate XML for the employee; do not call the EJB tier }