home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Book Home Java Servlet Programming Search this book

12.5. Dynamic Language Negotiation

Now let's push the envelope yet a little farther (perhaps off the edge of the table) with a servlet that tailors its output to match the language preferences of the client. This allows the same URL to serve its content to readers across the globe in their native tongues.

12.5.1. Language Preferences

There are two ways a servlet can know the language preferences of the client. First, the browser can send the information as part of its request. Newer browsers, such as Netscape Navigator 4 and Microsoft Internet Explorer 4, allow users to specify their preferred languages. With Netscape Navigator 4, this is done under Edit | Preferences | Navigator | Languages. With Microsoft Internet Explorer 4, it's done under View | Internet Options | General | Languages.

A browser sends the user's language preferences to the server using the Accept-Language HTTP header. The value of this header specifies the language or languages that the client prefers to receive. Note that the HTTP specification allows this preference to be ignored. An Accept-Language header value looks something like the following:

en, es, de, ja, zh-TW

This indicates the client user reads English, Spanish, German, Japanese, and Chinese appropriate for Taiwan. By convention, languages are listed in order of preference. Each language may also include a q-value that indicates, on a scale from 0.0 to 1.0, an estimate of the user's preference for that language. The default q-value is 1.0 (maximum preference). An Accept-Language header value including q-values looks like this:

en, es;q=0.8, de;q=0.7, ja;q=0.3, zh-TW;q=0.1

This header value means essentially the same thing as the previous example.

The second way a servlet can know the language preferences of the client is by asking. For example, a servlet might generate a form that asks which language the client prefers. Thereafter, it can remember and use the answer, perhaps using the session tracking techniques discussed in Chapter 7, "Session Tracking".

12.5.2. Charset Preferences

In addition to an Accept-Language HTTP header, a browser may send an Accept-Charset header that tells the server which charsets it understands. An Accept-Charset header value may look something like this:

iso-8859-1, utf-8

This indicates the browser understands ISO-8859-1 and UTF-8. If the Accept-Charset isn't sent or if its value contains an asterisk (*), it can be assumed the client accepts all charsets. Note that the current usefulness of this header is limited: few browsers yet send the header, and those browsers that do tend to send a value that contains an asterisk.

12.5.3. Resource Bundles

Using Accept-Language (and, in some cases, Accept-Charset), a servlet can determine the language in which it will speak to each client. But how can a servlet efficiently manage several localized versions of a page? One answer is to use Java's built-in support for resource bundles.

A resource bundle holds a set of localized resources appropriate for a given locale. For example, a resource bundle for the French locale might contain a French translation of all the phrases output by a servlet. Then, when the servlet determines it wants to speak French, it can load that resource bundle and use the stored phrases. All resource bundles extend java.util.ResourceBundle. A servlet can load a resource bundle using the static method ResourceBundle.getBundle() :

public static final 
  ResourceBundle ResourceBundle.getBundle(String bundleName, Locale locale)

A servlet can pull phrases from a resource bundle using the getString() method of ResourceBundle :

public final String ResourceBundle.getString(String key)

A resource bundle can be created in several ways. For servlets, the most useful technique is to put a special properties file in the server's classpath that contains the translated phrases. The file should be specially named according to the pattern bundlename_language.properties or bundlename_language_country.properties. For example, use Messages_fr.properties for a French bundle or Messages_zh_TW.properties for a Chinese/Taiwan bundle. The file should contain US-ASCII characters in the following format:

name1=value1
name2=value2
...

Each line may also contain whitespace and Unicode escapes. The information in this file can be loaded automatically by the getBundle() method.

12.5.4. Writing To Each His Own

Example 12-8 demonstrates the use of Accept-Language, Accept-Charset, and resource bundles with a servlet that says "Hello World" to each client in that client's own preferred language. Here's a sample resource bundle properties file for English, which you would store in HelloBabel_en.properties somewhere in the server's classpath (such as server_root/classes):

greeting=Hello world

And here's a resource bundle for Japanese, to be stored in HelloBabel_ja.properties:

greeting=\u4eca\u65e5\u306f\u4e16\u754c

This HelloBabel servlet uses the com.oreilly.servlet.LocaleNegotiator class that contains the black box logic to determine which Locale, charset, and ResourceBundle should be used. Its code is shown in the next section.

Example 12-8. A servlet version of the Tower of Babel

import java.io.*;
import java.util.*;
import java.text.*;
import javax.servlet.*;
import javax.servlet.http.*;

import com.oreilly.servlet.LocaleNegotiator;
import com.oreilly.servlet.ServletUtils;

public class HelloBabel extends HttpServlet {

  public void doGet(HttpServletRequest req, HttpServletResponse res)
                               throws ServletException, IOException {
    try {
      String bundleName = "HelloBabel";
      String acceptLanguage = req.getHeader("Accept-Language");
      String acceptCharset = req.getHeader("Accept-Charset");

      LocaleNegotiator negotiator =
        new LocaleNegotiator(bundleName, acceptLanguage, acceptCharset);

      Locale locale = negotiator.getLocale();
      String charset = negotiator.getCharset();
      ResourceBundle bundle = negotiator.getBundle();  // may be null

      res.setContentType("text/plain; charset=" + charset);
      res.setHeader("Content-Language", locale.getLanguage());
      res.setHeader("Vary", "Accept-Language");

      PrintWriter out = res.getWriter();

      DateFormat fmt = DateFormat.getDateTimeInstance(DateFormat.LONG,
                                                      DateFormat.LONG,
                                                      locale);
      if (bundle != null) {
        out.println("In " + locale.getDisplayLanguage() + ":");
        out.println(bundle.getString("greeting"));
        out.println(fmt.format(new Date()));
      }
      else {
        out.println("Bundle could not be found.");
      }
    }
    catch (Exception e) {
      log(ServletUtils.getStackTraceAsString(e));
    }
  }
}

This servlet begins by setting the name of the bundle it wants to use, and then it retrieves its Accept-Language and Accept-Charset headers. It creates a LocaleNegotiator, passing in this information, and quickly asks the negotiator which Locale, charset, and ResourceBundle it is to use. Note that a servlet may ignore the returned charset in favor of the UTF-8 encoding. Just remember, UTF-8 is not as widely supported as the charsets normally returned by LocaleNegotiator. Next, the servlet sets its headers: its Content-Type header specifies the charset, Content-Language specifies the locale's language, and the Vary header indicates to the client (if by some chance it should care) that this servlet can vary its output based on the client's Accept-Language header.

Once the headers are set, the servlet generates its output. It first gets a PrintWriter to match the charset. Then it says--in the default language, usually English--which language the greeting is to be in. Next, it retrieves and outputs the appropriate greeting from the resource bundle. And lastly, it prints the date and time appropriate to the client's locale. If the resource bundle is null, as happens when there are no resource bundles to match the client's preferences, the servlet simply reports that no bundle could be found.

12.5.5. The LocaleNegotiator Class

The code for LocaleNegotiator is shown in Example 12-9. Its helper class, LocaleToCharsetMap , is shown in Example 12-10. If you are happy to treat the locale negotiator as a black box, feel free to skip this section.

LocaleNegotiator works by scanning through the client's language preferences looking for any language for which there is a corresponding resource bundle. Once it finds a correspondence, it uses LocaleToCharsetMap to determine the charset. If there's any problem, it tries to fall back to U.S. English. The logic ignores the client's charset preferences.

The most complicated aspect of the LocaleNegotiator code is having to deal with the unfortunate behavior of ResourceBundle.getBundle() . The getBundle() method attempts to act intelligently. If it can't find a resource bundle that is an exact match to the specified locale, it tries to find a close match. The problem, for our purposes, is that getBundle() considers the resource bundle for the default locale to be a close match. Thus, as we loop through client languages, it's difficult to determine when we have an exact resource bundle match and when we don't. The workaround is to first fetch the ultimate fallback resource bundle, then use that reference later to determine when there is an exact match. This logic is encapsulated in the getBundleNoFallback() method.

Example 12-9. The LocaleNegotiator class

package com.oreilly.servlet;

import java.io.*;
import java.util.*;

import com.oreilly.servlet.LocaleToCharsetMap;

public class LocaleNegotiator {

  private ResourceBundle chosenBundle; 
  private Locale chosenLocale; 
  private String chosenCharset; 

  public LocaleNegotiator(String bundleName,
                          String languages,
                          String charsets) {

    // Specify default values:
    //   English language, ISO-8859-1 (Latin-1) charset, English bundle
    Locale defaultLocale = new Locale("en", "US");
    String defaultCharset = "ISO-8859-1";
    ResourceBundle defaultBundle = null;
    try {
      defaultBundle = ResourceBundle.getBundle(bundleName, defaultLocale);
    }
    catch (MissingResourceException e) {
      // No default bundle was found. Flying without a net.
    }

    // If the client didn't specify acceptable languages, we can keep
    // the defaults.
    if (languages == null) {
      chosenLocale = defaultLocale;
      chosenCharset = defaultCharset;
      chosenBundle = defaultBundle;
      return;  // quick exit
    }

    // Use a tokenizer to separate acceptable languages
    StringTokenizer tokenizer = new StringTokenizer(languages, ",");

    while (tokenizer.hasMoreTokens()) {
      // Get the next acceptable language.
      // (The language can look something like "en; qvalue=0.91")
      String lang = tokenizer.nextToken();

      // Get the locale for that language
      Locale loc = getLocaleForLanguage(lang);

      // Get the bundle for this locale. Don't let the search fallback 
      // to match other languages!
      ResourceBundle bundle = getBundleNoFallback(bundleName, loc);

      // The returned bundle is null if there's no match. In that case
      // we can't use this language since the servlet can't speak it.
      if (bundle == null) continue;  // on to the next language

      // Find a charset we can use to display that locale's language.
      String charset = getCharsetForLocale(loc, charsets);

      // The returned charset is null if there's no match. In that case
      // we can't use this language since the servlet can't encode it.
      if (charset == null) continue;  // on to the next language

      // If we get here, there are no problems with this language.
      chosenLocale = loc;
      chosenBundle = bundle;
      chosenCharset = charset;
      return;  // we're done
    }

    // No matches, so we let the defaults stand
    chosenLocale = defaultLocale;
    chosenCharset = defaultCharset;
    chosenBundle = defaultBundle;
  }

  public ResourceBundle getBundle() {
    return chosenBundle;
  }

  public Locale getLocale() {
    return chosenLocale;
  }

  public String getCharset() {
    return chosenCharset;
  }

  private Locale getLocaleForLanguage(String lang) {
    Locale loc;
    int semi, dash;

    // Cut off any q-value that might come after a semi-colon
    if ((semi = lang.indexOf(';')) != -1) {
      lang = lang.substring(0, semi);
    }

    // Trim any whitespace
    lang = lang.trim();

    // Create a Locale from the language. A dash may separate the
    // language from the country.
    if ((dash = lang.indexOf('-')) == -1) {
      loc = new Locale(lang, "");  // No dash, no country
    }
    else {
      loc = new Locale(lang.substring(0, dash), lang.substring(dash+1));
    }

    return loc;
  }

  private ResourceBundle getBundleNoFallback(String bundleName, Locale loc) {

    // First get the fallback bundle -- the bundle that will be selected 
    // if getBundle() can't find a direct match. This bundle can be
    // compared to the bundles returned by later calls to getBundle() in
    // order to detect when getBundle() finds a direct match.
    ResourceBundle fallback = null;
    try {
      fallback =
        ResourceBundle.getBundle(bundleName, new Locale("bogus", ""));
    }
    catch (MissingResourceException e) {
      // No fallback bundle was found.
    }

    try {
      // Get the bundle for the specified locale
      ResourceBundle bundle = ResourceBundle.getBundle(bundleName, loc);

      // Is the bundle different than our fallback bundle?
      if (bundle != fallback) {
        // We have a real match!
        return bundle;
      }
      // So the bundle is the same as our fallback bundle.
      // We can still have a match, but only if our locale's language 
      // matches the default locale's language.
      else if (bundle == fallback &&
            loc.getLanguage().equals(Locale.getDefault().getLanguage())) {
        // Another way to match
        return bundle;
      }
      else {
        // No match, keep looking
      }
    }
    catch (MissingResourceException e) {
      // No bundle available for this locale
    }

    return null;  // no match
  }

  protected String getCharsetForLocale(Locale loc, String charsets) {
    // Note: This method ignores the client-specified charsets
    return LocaleToCharsetMap.getCharset(loc);
  }
}

Example 12-10. The LocaleToCharsetMap class

package com.oreilly.servlet;

import java.util.*;

public class LocaleToCharsetMap {

  private static Hashtable map;

  static {
    map = new Hashtable();

    map.put("ar", "ISO-8859-6");
    map.put("be", "ISO-8859-5");
    map.put("bg", "ISO-8859-5");
    map.put("ca", "ISO-8859-1");
    map.put("cs", "ISO-8859-2");
    map.put("da", "ISO-8859-1");
    map.put("de", "ISO-8859-1");
    map.put("el", "ISO-8859-7");
    map.put("en", "ISO-8859-1");
    map.put("es", "ISO-8859-1");
    map.put("et", "ISO-8859-1");
    map.put("fi", "ISO-8859-1");
    map.put("fr", "ISO-8859-1");
    map.put("he", "ISO-8859-8");
    map.put("hr", "ISO-8859-2");
    map.put("hu", "ISO-8859-2");
    map.put("is", "ISO-8859-1");
    map.put("it", "ISO-8859-1");
    map.put("iw", "ISO-8859-8");
    map.put("ja", "Shift_JIS");
    map.put("ko", "EUC-KR");     // Requires JDK 1.1.6
    map.put("lt", "ISO-8859-2");
    map.put("lv", "ISO-8859-2");
    map.put("mk", "ISO-8859-5");
    map.put("nl", "ISO-8859-1");
    map.put("no", "ISO-8859-1");
    map.put("pl", "ISO-8859-2");
    map.put("pt", "ISO-8859-1");
    map.put("ro", "ISO-8859-2");
    map.put("ru", "ISO-8859-5");
    map.put("sh", "ISO-8859-5");
    map.put("sk", "ISO-8859-2");
    map.put("sl", "ISO-8859-2");
    map.put("sq", "ISO-8859-2");
    map.put("sr", "ISO-8859-5");
    map.put("sv", "ISO-8859-1");
    map.put("tr", "ISO-8859-9");
    map.put("uk", "ISO-8859-5");
    map.put("zh", "GB2312");
    map.put("zh_TW", "Big5");
  }

  public static String getCharset(Locale loc) {
    String charset;

    // Try for a full name match (may include country)
    charset = (String) map.get(loc.toString());
    if (charset != null) return charset;

    // If a full name didn't match, try just the language
    charset = (String) map.get(loc.getLanguage());
    return charset;  // may be null
  }
} 

12.5.6. Future Directions

In the future, you can expect to see improved internationalization support in the Servlet API and in Java itself. Some likely areas for improvement are these:

  • Support for additional charsets, especially those charsets that are commonly used on the Web.

  • New classes that help an application support multiple languages at the same time. These classes will make it easier for servlets to present information to the user using one language, while using another language for administrative tasks such as logging.

  • New classes that support language negotiation using a list of multiple locales. These classes will act in a similar fashion to LocaleNegotiator.



Library Navigation Links

Copyright © 2001 O'Reilly & Associates. All rights reserved.