home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Book Home Java Servlet Programming Search this book

2.5. Servlet Chaining and Filters

Now you've seen how an individual servlet can create content by generating a full page or by being used in a server-side include. Servlets can also cooperate to create content in a process called servlet chaining .

In many servers that support servlets, a request can be handled by a sequence of servlets. The request from the client browser is sent to the first servlet in the chain. The response from the last servlet in the chain is returned to the browser. In between, the output from each servlet is passed (piped) as input to the next servlet, so each servlet in the chain has the option to change or extend the content, as shown in Figure 2-9.[6]

[6] A web server could implement servlet chaining differently than described here. There is no reason the initial content must come from a servlet. It could come from a static file fetched with built-in server code or even from a CGI script. The Java Web Server does not have to make this distinction because all its requests are handled by servlets.

There are two common ways to trigger a chain of servlets for an incoming request. First, you can tell the server that certain URLs should be handled by an explicitly specified chain. Or, you can tell the server to send all output of a particular content type through a specified servlet before it is returned to the client, effectively creating a chain on the fly. When a servlet converts one type of content into another, the technique is called filtering .

figure

Figure 2-9. Servlet chaining

Servlet chaining can change the way you think about web content creation. Here are some of the things you can do with it:

Quickly change the appearance of a page, a group of pages, or a type of content.

For example, you can improve your site by suppressing all <BLINK> tags from the pages of your server, as shown in the next example. You can speak to those who don't understand English by dynamically translating the text from your pages to the language read by the client. You can suppress certain words that you don't want everyone to read, be they the seven dirty words or words not everyone knows already, like the unreleased name of your secret project. You could also suppress entire pages in which these words appear. You can enhance certain words on your site, so that an online news magazine could have a servlet detect the name of any Fortune 1000 companies and automatically make each company name a link to its home page.

Take a kernel of content and display it in special formats.

For example, you can embed custom tags in your page and have a servlet replace them with HTML content. Imagine an <SQL> tag whose query contents are executed against a database and whose results are placed in an HTML table. This is, in fact, similar to how the Java Web Server supports the <SERVLET> tag.

Support esoteric data types.

For example, you can serve unsupported image types with a filter that converts nonstandard image types to GIF or JPEG.

You may be asking yourself, why you would want to use a servlet chain when you could instead write a script that edits the files in place--especially when there is an additional amount of overhead for each servlet involved in handling a request? The answer is that servlet chains have a threefold advantage:

  • They can easily be undone, so when users riot against your tyranny of removing their <BLINK> freedom, you can quickly reverse the change and appease the masses.

  • They handle dynamically created content, so you can trust that your restrictions are maintained, your special tags are replaced, and your dynamically converted PostScript images are properly displayed, even in the output of a servlet (or a CGI script).

  • They handle the content of the future, so you don't have to run your script every time new content is added.

2.5.1. Creating a Servlet Chain

Our first servlet chain example removes <BLINK> tags from HTML pages. If you're not familiar with the <BLINK> tag, be thankful. It is a tag recognized by many browsers in which any text between the <BLINK> and </BLINK> tags becomes a flashing distraction. Sure, it's a useful feature when used sparingly. The problem is that many page authors use it far too often. It has become the joke of HTML.

Example 2-5 shows a servlet that can be used in a servlet chain to remove the <BLINK> tag from all of our server's static HTML pages, all its dynamically created HTML pages, and all the pages added to it in the future. This servlet introduces the getReader() and getContentType() methods.

Example 2-5. A servlet that removes the <BLINK> tag from HTML pages

import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;

public class Deblink extends HttpServlet {

  public void doGet(HttpServletRequest req, HttpServletResponse res) 
                               throws ServletException, IOException {

    String contentType = req.getContentType();  // get the incoming type
    if (contentType == null) return;  // nothing incoming, nothing to do
    res.setContentType(contentType);  // set outgoing type to be incoming type

    PrintWriter out = res.getWriter();

    BufferedReader in = req.getReader();

    String line = null;
    while ((line = in.readLine()) != null) {
      line = replace(line, "<BLINK>", "");
      line = replace(line, "</BLINK>", "");
      out.println(line);
    }
  }

  public void doPost(HttpServletRequest req, HttpServletResponse res)
                                throws ServletException, IOException {
    doGet(req, res);
  }

  private String replace(String line, String oldString, String newString) {
    int index = 0;
    while ((index = line.indexOf(oldString, index)) >= 0) {
      // Replace the old string with the new string (inefficiently)
      line = line.substring(0, index) +
             newString +
             line.substring(index + oldString.length());
      index += newString.length();
    }
    return line;
  }
}

This servlet overrides both the doGet() and doPost() methods. This allows it to work in chains that handle either type of request. The doGet() method contains the core logic, while doPost() simply dispatches to doGet(), using the same technique as the Hello example.

Inside doGet(), the servlet first fetches its print writer. Next, the servlet calls req.getContentType() to find out the content type of the data it is receiving. It sets its output type to match, or if getContentType() returned null, it realizes there is no incoming data to deblink and simply returns. To read the incoming data, the servlet fetches a BufferedReader with a call to req.getReader(). The reader contains the HTML output from the previous servlet in the chain. As the servlet reads each line, it removes any instance of <BLINK> or </BLINK> with a call to replace() and then returns the line to the client (or perhaps to another servlet in the chain). Note that the replacement is case-sensitive and inefficient; a solution to this problem that uses regular expressions is included in Chapter 13, "Odds and Ends".

A more robust version of this servlet would retrieve the incoming HTTP headers and pass on the appropriate headers to the client (or to the next servlet in the chain). Chapter 4, "Retrieving Information" and Chapter 5, "Sending HTML Information" explain the handling and use of HTTP headers. There's no need to worry about it now, as the headers aren't useful for simple tasks like the one we are doing here.

2.5.2. Running Deblink

If you're using the Java Web Server, before running Deblink you have to first tell the web server you want servlet chains enabled. Go to managing the Web Service, go to the Setup section, select Site, and then select Options. Here you can turn servlet chaining on. By default it's turned off to improve performance.

As we said before, there are two ways to trigger a servlet chain. A chain can be explicitly specified for certain requests, or it can be created on the fly when one servlet returns a content type that another servlet is registered to handle. We'll use both techniques to run Deblink.

First, we'll explicitly specify that all files with a name matching the wildcard pattern *.html should be handled by the file servlet followed by the Deblink servlet. The file servlet is a core Java Web Server servlet used to retrieve files. Normally it is the only servlet invoked to return an HTML file. But here, we're going to pass its output to Deblink before returning the HTML to the client. Go back to managing the Web Service, go to the Setup section, and select Servlet Aliases. Here you will see which servlets are invoked for different kinds of URLs, as shown in Figure 2-10.

figure

Figure 2-10. Standard servlet aliases

These mappings provide some insight into how the Java Web Server uses its core servlets. You can see / invokes file, *.shtml invokes ssinclude , and /servlet invokes invoker . The most specific wildcard pattern is used, which is why /servlet uses the invoker servlet to launch a servlet instead of using the file servlet to return a file. You can change the default aliases or add new aliases. For example, changing the /servlet prefix would change the URL used to access servlets. Right now, we're interested in adding another alias. You should add an alias that specifies that *.html invokes file,Deblink. After making this change, any file ending in .html is retrieved by the file servlet and passed to Deblink.

Try it yourself. Create a blinky.html file in server_root/public_html that is sprinkled with a few blink tags and try surfing to http://server:8080/blinky.html. If everything's set up right, all evidence of the blink tags is removed.

2.5.3. The Loophole

This technique has one large loophole: not all HTML comes from files with the .html extension. For example, HTML can come from a file with the .htm extension or from some dynamically created HTML. We can work around multiple file extensions with more aliases. This, however, still doesn't catch dynamic content. We need our second technique for creating a servlet chain to plug that hole.

We really want to specify that all text/html content should pass through the Deblink servlet. The JavaServer Administration Tool does not yet include a graphical way to do this. Instead, we can make the change with a simple edit of a properties file. The properties file can be found at server_root/properties/server/javawebserver/webpageservice/mimeservlets.properties. It contains directives like this:

java-internal/parsed-html=ssinclude

This directive indicates that all responses with a Content-Type header of java-internal/parsed-html should be passed to the ssinclude (server-side include) servlet. Why is this necessary? Without it, the ssinclude servlet would handle only static files with the .shtml extension. It would suffer from the same loophole: dynamically created pages containing the <SERVLET> tag would be ignored. With this directive, any servlet can set its content type to java-internal/parsed-html, which causes the ssinclude servlet to handle its output.

To specify that all text/html content is passed through Deblink, we need to add our own directive:

text/html=Deblink

You need to restart your server before this change can take effect.

After making this change, all HTML content served by the server has its <BLINK> tags removed.[7] Try it yourself! Change your HelloWorld servlet to <BLINK> its message and watch the Deblink servlet silently remove all evidence of the deed.

[7] Unfortunately, some servers (including the Java Web Server 1.1.1) have a bug where they are too smart for their own good. They literally feed all text/html content to the Deblink servlet--even the text/html content being output by the Deblink servlet itself! In other words, every HTML page is deblinked forever (or until the client stops the request, whichever comes first).



Library Navigation Links

Copyright © 2001 O'Reilly & Associates. All rights reserved.