Java-Based Web Technologies (Java and XSLT)

In a perfect world, a single web development technology would be inexpensive, easy to maintain, offer rapid response time, and be highly scalable. It would also be portable to any operating system or hardware platform and would adapt well to future requirement changes. It would support access from wireless devices, standalone client applications, and web browsers, all with minimal changes to code.

No perfect solution exists, nor is one likely to exist anytime soon. If it did, many of us would be out of work. A big part of software engineering is recognizing that tradeoffs are inevitable and knowing when to sacrifice one set of goals in order to deliver the maximum value to your customer or business. For example, far too many programmers focus on raw performance metrics without any consideration for ease of development or maintainability by nonexperts. These decisions are hard and are often subjective, based on individual experience and preferences.

The goal of this chapter is to look at the highlights of several popular technologies for web application development using Java and see how each measures up to an XSLT-based approach. The focus is on architecture, which implies a high-level viewpoint without emphasis on specific implementation details. Although XSLT offers a good balance between performance, maintainability, and flexibility, it is not the right solution for all applications. It is hoped that the comparisons made here will help you decide if XSLT is the right choice for your web applications.

4.1. Traditional Approaches

Before delving into more sophisticated options, let's step back and look at a few basic approaches to web development using Java. For small web applications or moderately dynamic web sites, these approaches may be sufficient. As you might suspect, however, none of these approaches hold up as well as XML and XSLT when your sites get more complex.

4.1.1. CGI

Common Gateway Interface (CGI) is a protocol for interfacing external applications, which can be written in just about any language, with web servers. The most common language choices for CGI are C and Perl. This interface is accomplished in a number of ways, depending on the type of request. For example, parameters associated with an HTTP GET request are passed to the CGI script via the QUERY_STRING environment variable. HTTP POST data, on the other hand, is piped to the standard input stream of the CGI script. CGI always sends results back to the web server via its standard output.

Ordinary CGI programs are invoked from the web server as external programs, which is the most notable difference when compared with servlets. With each request from the browser, the web server spawns a new process to run the CGI program. Aside from the obvious performance penalty, this also makes it difficult to maintain state information between requests. A web-based shopping cart is a perfect example of state information that must be preserved between requests. Figure 4-1 illustrates the CGI process.

Figure 4-1. CGI process

NOTE: FastCGI is an alternative to CGI with two notable differences. First, FastCGI processes do not exit with each request/response cycle. Second, the environment variable and pipe I/O mechanism of CGI has been eschewed in favor of TCP connections, allowing FastCGI programs to be distributed to different servers. The net result is that FastCGI eliminates the most vexing problems of CGI while making it easy to salvage existing CGI programs.

Although technically possible, using Java for CGI programming is not generally a good idea. In fact, it is an awful idea! The Java Virtual Machine (JVM) would have to be launched with each and every request, which would be painfully slow. Any Java programmer knows that application startup time has never been one of the strengths of Java. Servlets had to address this issue first. What was needed was a new approach in which the JVM was loaded a single time and left running even when no requests came in. The term servlet engine referred to the JVM that hosted the servlets, often serving a dual role as an HTTP web server.

4.1.2. Servlets as CGI Replacements

Sun's Java servlet API was originally released way back in 1997 when Java was mostly a client-side development language. Servlets were originally marketed and used as replacements for CGI programs. Developers were quick to adopt servlets because of their advantages over CGI.

Since the servlet engine can run for as long as the web server runs, servlets can be loaded into memory once and kept around for subsequent requests. This is easy to accomplish in Java because servlets are really nothing more than Java classes. The JVM simply loads the servlet objects into memory, hanging on to the references for as long as the web application runs.

The persistent nature of servlets results in two additional benefits, both of which push servlets well beyond the capabilities of basic CGI. First, state information can be preserved in memory for long periods of time. Even though the browser loses its connection to the web server after each request/response cycle, servlets can store objects in memory until the browser reconnects for the next page. Secondly, since Java has built-in threading capability, it is possible for numerous clients to share the same servlet instance. Creating additional threads is far more efficient than spawning additional external processes, making servlets very good performers.

Early versions of the Java servlet API did not specify the mechanism for deployment (i.e., installation) onto servers. Although the servlet API was consistent, deployment onto different servlet engines was completely vendor specific. With Version 2.2 of the servlet API, however, proprietary servlet engines were dropped in favor of a generic servlet container specification. The idea of a container is to formalize the relationship between a servlet and the environment in which it resides. This made it possible to deploy the same servlet on any vendor's container without any changes.

Along with the servlet container came the concept of a web application. A web application consists of a collection of servlets, static web pages, images, or any other resources that may be needed. The standard unit of deployment for web applications is the Web Application Archive (WAR) file, which is actually just a Java Application Archive (JAR) file that uses a standard directory structure and has a .war file extension. In fact, you use the jar command to create WAR files. Along with the WAR file comes a deployment descriptor, which is an XML configuration file that specifies all configuration aspects of a web application. The important details of WAR files and deployment descriptors will be outlined in Chapter 6, "Servlet Basics and XSLT".

Servlets are simple to implement, portable, can be deployed to any servlet container in a consistent way, and offer high performance. Because of these advantages, servlets are the underlying technology for every other approach discussed in this chapter. When used in isolation, however, servlets do have limitations. These limitations manifest themselves as web applications grow increasingly complex and web pages become more sophisticated.

The screen shot shown in Figure 4-2 shows a simple web page that lists television shows for the current day. In this first implementation, a servlet is used. It will be followed with a JavaServer Pages (JSP) implementation presented later in this chapter.

Figure 4-2. ScheduleServlet output

The Schedule Java class has a method called getTodaysShows( ), that returns an array of Show objects. The array is already sorted, which reduces the amount of work that the servlet has to do to generate this page. The Schedule and Show classes are used for all of the remaining examples in this chapter. Ideally, this will help demonstrate that no matter which approach you take, keeping business logic and database access code out of the servlet makes it easier to move to new technologies without rewriting all of your code. The code for ScheduleServlet.java is shown in Example 4-1. This is typical of a first-generation servlet, generating its output using a series of println( ) statements.

Example 4-1. ScheduleServlet.java

package chap4;

import java.io.*;
import java.text.SimpleDateFormat;
import javax.servlet.*;
import javax.servlet.http.*;

public class ScheduleServlet extends HttpServlet {
    public void doGet(HttpServletRequest request,
            HttpServletResponse response) throws IOException,
            ServletException {

        SimpleDateFormat dateFmt = new SimpleDateFormat("hh:mm a");

        Show[] shows = Schedule.getInstance().getTodaysShows( );

        response.setContentType("text/html");
        PrintWriter pw = response.getWriter( );
        pw.println("<html><head><title>Today's Shows</title></head><body>");
        pw.println("<h1>Today's Shows</h1>");
        pw.println("<table border=\"1\" cellpadding=\"3\"");
        pw.println(" cellspacing=\"0\">");

        pw.println("<tr><th>Channel</th><th>From</th>");
        pw.println("<th>To</th><th>Title</th></tr>");

        for (int i=0; i<shows.length; i++) {
            pw.println("<tr>");
            pw.print("<td>");
            pw.print(shows[i].getChannel( ));
            pw.println("</td>");
            pw.print("<td>");
            pw.print(dateFmt.format(shows[i].getStartTime( )));
            pw.println("</td>");
            pw.print("<td>");
            pw.print(dateFmt.format(shows[i].getEndTime( )));
            pw.println("</td>");
            pw.print("<td>");
            pw.print(shows[i].getTitle( ));
            pw.println("</td>");
            pw.println("</tr>");
        }
        pw.println("</table>");
        pw.println("</body>");
        pw.println("</html>");
    }
}

If you are interested in the details of servlet coding, be sure to read Chapter 6, "Servlet Basics and XSLT". For now, focus on how the HTML is generated. All of those println( ) statements look innocuous enough in this short example, but a "real" web page will have thousands of println( ) statements, resulting in code that is quite difficult to maintain over the years. Generally, you will want to factor that code out into a series of methods or objects that generate fragments of the HTML. However, this approach is still tedious and error prone.

The main problems are development scalability and future maintainability. The code becomes increasingly difficult to write as your pages get more complex, and it becomes very difficult to make changes to the HTML when new requirements arrive. Web content authors and graphic designers are all but locked out of the process since it takes a programmer to create and modify the code. Each minor change requires your programming staff to recompile, test, and deploy to the servlet container.

Beyond the tedious nature of HTML generation, first-generation servlets tend to do too much. It is not clear where error handling, form processing, business logic, and HTML generation are supposed to reside. Although we are able to leverage two helper classes to generate the list of shows, a more rigorous approach will be required for complex web applications. All of the remaining technologies presented in this chapter are designed to address one or more of these issues, which become increasingly important as web applications get more sophisticated.

4.1.3. JSP

You have no doubt heard about JSP. This is a hot area in web development right now with some pretty hefty claims about productivity improvements. The argument is simple: instead of embedding HTML code into Java servlets, which requires a Java programmer, why not start out with static HTML? Then add special tags to this HTML that are dynamically expanded by the JSP engine, thus producing a dynamic web page. Example 4-2 contains a very simple example of JSP that produces exactly the same output as ScheduleServlet.

Example 4-2. schedule.jsp

<%@ page import="chap4.*,java.text.*" %>
<%! SimpleDateFormat dateFmt = new SimpleDateFormat("hh:mm a"); %>
<html>
  <head>
    <title>Today's Shows</title>
  </head>
<body>
<h1>Today's Shows</h1>
<% Show[] shows = Schedule.getInstance().getTodaysShows( ); %>
<table border="1" cellpadding="3" cellspacing="0">
  <tr><th>Channel</th><th>From</th><th>To</th><th>Title</th></tr>

  <% for (int i=0; i<shows.length; i++) { %>
  <tr>
    <td><%= shows[i].getChannel( ) %></td>
	<td><%= dateFmt.format(shows[i].getStartTime( )) %></td>
	<td><%= dateFmt.format(shows[i].getEndTime( )) %></td>
	<td><%= shows[i].getTitle( ) %></td>
  </tr>
  <% } %>
</table>
</body>
</html>

As schedule.jsp shows, most of the JSP is static HTML with dynamic content sprinkled in here and there using special JSP tags. When a client first requests a JSP, the entire page is translated into source code for a servlet. This generated servlet code is then compiled and loaded into memory for use by subsequent requests. During the translation process, JSP tags are replaced with dynamic content, so the end user only sees the HTML output as if the entire page was static.

Runtime performance of JSP is comparable to hand-coded servlets because the static content in the JSP is generally replaced with a series of println( ) statements in the generated servlet code. The only major performance hit occurs for the first person to visit the JSP, because it will have to be translated and compiled. Most JSP containers provide options to precompile the JSP, so even this hit can be avoided.

Debugging in JSP can be somewhat challenging. Since JSP pages are machine translated into Java classes, method signatures and class names are not always intuitive. When a programming error occurs, you are often faced with ugly stack traces that show up directly in the browser. You do have the option of specifying an error page to be displayed whenever an unexpected condition occurs. This gives the end user a more friendly error message, but does little to help you diagnose the problem.

Here is a portion of what Apache's Tomcat shows in the web browser when the closing curly brace (}) is accidentally omitted from the loop shown in the JSP example:

A Servlet Exception Has Occurred
org.apache.jasper.JasperException: Unable to compile class for
JSP..\work\localhost\chap4\_0002fschedule_0002ejspschedule_jsp_2.java:104:
'catch' without 'try'.
        } catch (Throwable t) {
          ^
..\work\localhost\chap4\_0002fschedule_0002ejspschedule_jsp_2.java:112:
'try' without 'catch' or 'finally'.
}
^
..\work\localhost\chap4\_0002fschedule_0002ejspschedule_jsp_2.java:112:
'}' expected.
}
 ^
3 errors

at org.apache.jasper.compiler.Compiler.compile(Compiler.java:294)
at org.apache.jasper.servlet.JspServlet.doLoadJSP(JspServlet.java:478)
...remainder of stack trace omitted

The remainder of the stack trace is not very helpful because it simply lists methods that are internal to Tomcat. _0002fschedule_0002ejspschedule_jsp_2 is the name of the Java servlet class that was generated. The line numbers refer to positions in this generated code, rather than in the JSP itself.

Embedding HTML directly into servlets is not appealing because it requires a programmer to maintain. With JSP, you often embed Java code into HTML. Although the embedding is reversed, you still have not cleanly separated HTML generation and programming logic. Think about the problems you encounter when the validation logic in a JSP goes beyond a simple one-page example. Do you really want hundreds of lines of Java code sprinkled throughout your HTML, surrounded by those pretty <% %> tags? Unfortunately, far too many JSP pages have a substantial amount of Java code embedded directly in the HTML.

The first few iterations of JSP did not offer bulletproof approaches for separating Java code from the HTML. Although JavaBeans tags were offered in an attempt to remove some Java code, the level of sophistication was quite limited. These tags allow JSPs to interact with helper classes written according to Sun's JavaBeans API (http://java.sun.com/products/javabeans). Recent trends in the JSP specification have made substantial improvements. The big push right now is for custom tags,[14] which finally allow you to remove the Java code from your pages. A web page with custom tags may look like Example 4-3.

[14] Technically, programmers create custom actions, which are invoked using custom JSP tags.

Example 4-3. JSP with custom tags

<%@ taglib uri="/my_taglib" prefix="abc" %>
<html>
<head>
<title>JSP Tag Library Demonstration</title>
</head>
<body>
  <abc:standardHeader/>
  <abc:companyLogo/>
  
  <h1>Recent Announcements</h1>
  <abc:announcements filter="recent"/>
  
  <h1>Job Openings</h1>
  <abc:jobOpenings department="hr"/>
  <abc:standardFooter/>
</body>
</html>

As you can see, custom tags look like normal XML tags with a namespace prefix . Namespace prefixes are used to give XML tags unique names. Because you select the prefix for each tag library, you can use libraries from many different vendors without fear of naming conflicts. These tags are mapped to Java classes called tag handlers that are responsible for the actual work. In fact, the JSP specification does not limit the underlying implementation to Java, so other languages can be used if the JSP container supports it. Using the custom tag approach, programmers in your company can produce a set of approved tags for creating corporate logos, search boxes, navigation bars, and page footers. Nonprogrammers can focus on HTML layout, oblivious to the underlying tag handler code. The main drawback to this approach is the current lack of standard tags. Although several open source projects are underway to develop custom tag libraries, it is unlikely that you will be able to find an existing custom tag for every requirement.

One persistent problem with a pure JSP approach is that of complex validation. Although JSP with custom tags can be an ideal approach for displaying pages, the approach falls apart when a JSP is used to validate the input from a complex HTML form. In this situation, it is almost inevitable that Java code -- perhaps a lot of it -- will creep into the page. This is where a hybrid approach (JSP and servlets), which will be covered in the next section, is desirable.

Compared with an XML/XSLT approach, JSP requires a lot more effort to cleanly separate presentation from the underlying data and programming logic. For web sites that are mostly static, JSP can be easy for nonprogrammers to create, since they work directly in HTML. When dynamic content becomes more prevalent, your options are to embed lots of Java code into the JSP, create custom tags, or perhaps write Java beans that output fragments of HTML. Embedding code into the JSP is not desirable because of the ugly syntax and maintenance difficulties. The other approaches do hide code from the JSP author, but some part of your web application (to be consistent) is still cranking out HTML from Java code, either in custom tags or JavaBeans components. This still raises serious questions about the ability to make quick changes to your HTML without recompiling and deploying your Java code.

Another weakness of JSPs in comparison with XML and XSLT becomes obvious when you try to test your web application. With JSP, it is virtually impossible to test your code outside the bounds of a web browser and servlet container. In order to write a simple automated unit test against a JSP, you have to start a web server and invoke your JSPs via HTTP requests. With XML and XSLT, on the other hand, you can programmatically generate the XML data without a web browser or server. This XML can then be validated against a DTD or schema. You can also test the XSLT stylesheets using command-line tools without deploying to a servlet container or starting a web server. The result of the transformation can even be validated again with a DTD if you use XHTML instead of HTML.

4.1.4. Template Engines

Before moving on, let's discuss template engines. A quick search on the Internet reveals that template engines are abundant, each claiming to be better than JSP for various reasons. For the most part, template engines have a lot in common with JSP, particularly if you restrict yourself to custom tags. There are some differences, however:

Template engines typically forbid you from embedding Java code into pages. Although JSP allows Java code along with HTML, it is not considered good form.
Most template engines are not compiled, so they do not have the same problems that JSP has with error messages. They also start up faster on the first invocation, which can make development easier. The effect on end users is minimal. From a deployment perspective, you do not need a Java compiler on the web server as you do with JSP.
Template engines come with an existing library of tags or simple scripting languages. JSP does not provide any standard tags, although numerous libraries are available from other vendors and open source projects. The JSP API is open, so you can create your own custom tags with a fair amount of effort. Template engines have their own unique mechanisms for integrating with underlying Java code.
JSP has the backing of Sun and is pretty much available out of the box on any servlet container. The main benefit of a "standard" is the wide availability of documentation, knowledgeable people, and examples. There are many implementations of JSP to choose from.

4.1.5. The Hybrid Approach

Since JSP now has custom tags, you can remove (hide, actually) all of the Java code when "rendering," or generating a page to send to the browser. When a complex HTML form is posted to the JSP, however, you still have problems. You must verify that all fields are present, verify that the data is within bounds, and clean up the data by checking for null values and trimming all strings. Validation is not particularly difficult, but it can be tedious and requires a lot of custom code. You do not want to embed that code directly into a JSP because of the debugging and maintenance issues.

The solution is a hybrid approach, in which a servlet works in conjunction with a JSP. The servlet API has a nice class called RequestDispatcher that allows server-side forwarding and including. This is the normal mechanism for interaction between the servlet and JSP. Figure 4-3 illustrates this design at a high level.

Figure 4-3. Hybrid JSP/servlet approach

This approach combines the best features of servlets with the best features of JSPs. The arrows indicate the flow of control whenever the browser issues a request. The job of the servlet is to intercept the request, validate that the form data is correct, and delegate control to an appropriate JSP. Delegation occurs via javax.servlet.RequestDispatcher, which is a standard part of the servlet API. The JSP simply renders the page, ideally using custom tags and no Java code mixed with the HTML.

The main issue with this approach becomes evident when your web site begins to grow beyond a few pages. You must make a decision between one large servlet that intercepts all requests, a separate servlet per page, or helper classes responsible for processing individual pages. This is not a difficult technological challenge, but rather a problem of organization and consistency. This is where web frameworks can lend a helping hand.

Chapter 4. Java-Based Web Technologies

Contents: