Receiving Multilingual Input (Java Servlet Programming)

We need to discuss one more aspect of internationalization: receiving multilingual input. It's actually quite simple for a servlet to receive multilingual character data. The ServletRequest.getReader() method handles the task automatically. It returns a BufferedReader specially built to read the character encoding of the input data. For example, if the Content-Type of the servlet's input is "text/html; charset=Shift_JIS", the BufferedReader is one that reads Shift_JIS characters.

Because getReader() works automatically, it means our Deblink servlet and other chained servlets found throughout the book are already multilingual friendly. No matter what charset is used for the content they receive, they always read the input characters correctly using getReader().

Example 12-13 shows another servlet that uses getReader(). This servlet is designed to be the last servlet in a chain. It uses getReader() to read its input as character data, then outputs the characters using the UTF-8 encoding.

Example 12-13. UTF-8 encoder

import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;

public class UTF8 extends HttpServlet {

  public void doGet(HttpServletRequest req, HttpServletResponse res)
                               throws ServletException, IOException {
    try {
      // Get a reader to read the incoming data
      BufferedReader reader = req.getReader();

      // Get a writer to write the data in UTF-8
      res.setContentType("text/html; charset=UTF-8");
      PrintWriter out = res.getWriter();

      // Read and write 4K chars at a time
      // (Far more efficient than reading and writing a line at a time)
      char[] buf = new char[4 * 1024];  // 4Kchar buffer
      int len;
      while ((len = reader.read(buf, 0, buf.length)) != -1) {
        out.write(buf, 0, len);
      }
    }
    catch (Exception e) {
      getServletContext().log(e, "Problem filtering page to UTF-8");
    }
  }

  public void doPost(HttpServletRequest req, HttpServletResponse res)
                         throws ServletException, IOException {
    doGet(req, res);
  }
}

Sometimes it's useful for a servlet to determine the charset of its input. For this you can use the getCharacterEncoding() method of ServletRequest, introduced in the Servlet API 2.0. Note that this method does not exist in the Java Web Server 1.1.x implementation of ServletRequest, as the method was added between the release of the Java Web Server 1.1 and the official Servlet API 2.0 release with JSDK 2.0. For maximum portability you can do what getReader() does and fetch the request's content type using getContentType(). Any charset information can be found in the content type following the "charset=" tag.

12.7. Receiving Multilingual Input

Example 12-13. UTF-8 encoder