home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


O'Reilly Network     Published on the O'Reilly Network (http://www.oreillynet.com/)
    http://www.onlamp.com/pub/a/php/2002/11/28/php_i18n.html
    See this if you're having trouble printing code examples


Internationalization and Localization with PHP

by Adam Trachtenberg, coauthor of PHP Cookbook
11/28/2002

While everyone who programs in PHP has to learn some English eventually to get a handle on its function names and language constructs, PHP can create applications in just about any human language. Some applications need to be used by speakers of many different languages. PHP's internationalization and localization support makes it easier to make an application written for French speakers useful for German speakers.

Internationalization (often abbreviated I18N--there are 18 letters between the first "i" and the last "n") is the process of taking an application designed for just one locale and restructuring it so that it can be used in many different locales. Localization (often abbreviated L10N--there are 10 letters between the first "l" and the "n") is the process of adding support for a new locale to an internationalized application.

Localizing different kinds of content requires different techniques. This article covers an object-oriented method for localizing plain text messages and images. The PHP Cookbook contains additional recipes for dates, times, and currency. There are also recipes on using GNU gettext and other I18N and L10N topics.

Locales

A locale is a group of settings that describe text formatting and language customs in a particular area of the world. A locale name generally has three components. The first, an abbreviation that indicates a language, is mandatory. For example, "en" stands for English and "pt" for Portuguese. An optional country specifier comes next, after an underscore, to distinguish between different versions of the same language spoken in different countries. For example, "en_US" and "en_GB" specify U.S. and British English respectively, while "pt_BR" and "pt_PT" identify Brazilian and Portugese Portuguese. Finally, after a period, comes an optional character-set specifier. Taiwanese Chinese using the Big5 character set is encoded as "zh_TW.Big5". Note that while most locale names follow these conventions, some don't.

Message Catalog

To incorporate I18N support into your program, maintain a message catalog of words and phrases and retrieve the appropriate string from the message catalog before printing it. Here's a simple message catalog with foods in American and British English and a function to retrieve words from the catalog:

<?php
$messages = array (
    'en_US'=> array(
       'My favorite foods are' =>
           'My favorite foods are',
       'french fries' => 'french fries',
       'biscuit' => 'biscuit',
       'candy' => 'candy',
       'potato chips' => 'potato chips',
       'cookie' => 'cookie',
       'corn' => 'corn',
       'eggplant' => 'eggplant'
    ),

    'en_GB'=> array(
        'My favorite foods are' =>
            'My favourite foods are',
        'french fries' => 'chips',
        'biscuit' => 'scone',
        'candy' => 'sweets',
        'potato chips' => 'crisps',
        'cookie' => 'biscuit',
        'corn' => 'maize',
        'eggplant' => 'aubergine'
    )
);

function msg($s) {
    global $LANG;
    global $messages;
    
    if (isset($messages[$LANG][$s])) {
        return $messages[$LANG][$s];
    } else {
        error_log("l10n error:LANG:" . 
            "$lang,message:'$s'");
    }
}
?>

This short program uses the message catalog to print out a list of foods:

<?php
$LANG ='en_GB';

print msg('My favorite foods are').":\n";
print msg('french fries')."\n";
print msg('potato chips')."\n";
print msg('corn')."\n";
print msg('candy')."\n";
?>

My favourite foods are:
chips
crisps
maize
sweets

To have the program output in American English instead of British English, just set $LANG to en_US.

Variable Phrases

You can combine the msg() message retrieval function with printf() to store phrases that require values to be substituted into them. Consider the English sentence "I am 12 years old." In Spanish, the corresponding phrase is "Tengo 12 años." The Spanish phrase can be built by stitching together translations of "I am," the numeral 12, and "years old." It's easier, though, to store them in the message catalogs as printf()-style format strings:

<?php
$messages = array(
    'en_US' => array(
        'I am X years old.' =>
            'I am %d years old.'),
    'es_US' => array(
        'I am X years old.' => 
            'Tengo %d años.')
);
?>

You can then pass the results of msg() to printf() as a format string:

<?php
$LANG ='es_US';

printf(msg('I am X years old.'), 12);
?>

Tengo 12 años.  

For phrases that require the substituted values to be in a different order in different languages, printf() supports changing the order of the arguments:

<?php
$messages = array(
    'en_US' => array(
        'I am X years and Y months old.' =>
        'I am %d years and %d months old.'),
    'es_US' => array(
        'I am X years and Y months old.'=>
        'Tengo %2$d meses y %1$d años.')
);
?>

With either language, call sprintf() with the same order of arguments (i.e., first years, then months):

<?php
$LANG ='es_US';

printf(msg('I am X years and Y months old.'),12,7);
?>

Tengo 7 meses y 12 años.  

In the format string, %2$ tells printf() to use the second argument, and %1$ tells it to use the first.

 

Message Objects

These phrases can also be stored as function return values instead of strings in an array. Storing the phrases as functions removes the need to use printf(). Functions that return a sentence look like this:

<?php
// English version
function i_am_X_years_old($age){
    return "I am $age years old.";
}

// Spanish version
function i am_X_years_old($age){
    return "Tengo $age años.";
}
?>

If some parts of the message catalog belong in an array, and some parts belong in functions, an object is a helpful container for a language's message catalog. A base object and two simple message catalogs look like this:

<?php
class pc_MC_Base {
    var $messages;
    var $lang;

    function msg($s) {
        if (isset($this->messages[$s])) {
            return $this->messages[$s];
        } else {
            error_log("l10n error:LANG:" . 
                "$this->lang,message:'$s'");
                }
        }
}

class pc_MC_es_US extends pc_MC_Base {
    function pc_MC_es_US() {
        $this->lang ='es_US';
        $this->messages = array(
            'chicken' => 'pollo',
            'cow' => 'vaca',
            'horse' => 'caballo'
        );
    }

    function i_am_X_years_old($age){
        return "Tengo $age años";
    }
}

class pc_MC_en_US extends pc_MC_Base {
    function pc_MC_en_US() {
        $this->lang ='en_US';
        $this->messages = array(
            'chicken' => 'chicken',
            'cow' => 'cow',
            'horse' => 'horse'
        );
    }

    function i_am_X_years_old($age) {
        return "I am $age years old.";
    }
}
?>

Each message catalog object extends the pc_MC_Base class to get the msg() method, and then defines its own messages (in its constructor) and its own functions that return phrases. Here's how to print text in Spanish:

<?php
$MC = new pc_MC_es_US;
print $MC->msg('cow');
print $MC->i_am_X_years_old(15);
?>

To print the same text in English, $MC just needs to be instantiated as a pc_MC_en_US object instead of a pc_MC_es_US object. The rest of the code remains unchanged.

Localizing Images

Images need to be localized when you want to display images containing text in locale-appropriate languages.

Make an image directory for each locale you want to support, as well as a global image directory for images that have no locale-specific information. Create copies of each locale-specific image in the appropriate directories. Make sure that these images have the same filenames. Instead of printing image URLs directly, use a wrapper method similar to the msg() method demonstrated earlier.

The img() wrapper method looks for a locale-specific version of an image first, then a global one. If neither are present, it logs an error message. Building upon the pc_MC_Base class, the new class looks like this:

<?php
class pc_MC_Base {
    var $messages;
    var $images;
    var $lang;

    var $image_base_path = '/usr/local/www/images';
    var $image_base_url = '/images';

    function msg($s) {
        if (isset($this->messages[$s])) {
            return $this->messages[$s];
        } else {
            error_log("l10n error:LANG:" . 
                "$this->lang,message:'$s'");
        }
    }

    function img($f) {
        if (is_readable("$this->image_base_path/" . 
            "$this->lang/$f")) {
            print "$this->image_base_url/$this->lang/$f";
        } elseif (is_readable("$this->image_base_path/" .
            "global/$f")) {
            print "$this->image_base_url/global/$f";
        } else {
            error_log("l10n error:LANG:" .
                      "$this->lang,image:'$f'");
        }
    }
}
?>

The img() method needs to know both the path to the image file in the filesystem ($image_base_path) and the path to the image from the base URL of your site ($image_base_url). It uses the first to test if the file can be read and the second to construct an appropriate URL for the image.

A localized image must have the same filename in each localization directory. For example, an image that says "New!" on a yellow starburst should be called new.gif in both the images/en_US directory and the images/es_US directory, even though the file images/es_US/new.gif is a picture of a yellow starburst with the word "¡Nuevo!" on it. Don't forget that the alt text you display in your image tags also needs to be localized. A complete localized <img> tag looks like this:

<?php
$MC = new pc_MC_es_US;

printf('<img src="%s" alt="%s">',
    $MC->img('cancel.png'), $MC->msg('Cancel'));
?>

If the localized versions of a particular image have varied dimensions, store image height and width in the message catalog as well:

<?php
printf('<img src="%s" alt="%s" ' .
    'height="%d" width="%d">',
    $MC->img('cancel.png'), $MC->msg('Cancel'),
        $MC->msg('img-cancel-height'), 
        $MC->msg('img-cancel-width'));
?>

The localized messages for img-cancel-height and img-cancel-width are not text strings, but integers that describe the dimensions of the cancel.png image in each locale.

If you use a consistent naming convention for your variable and file names, create an imgsrc() method to simplify matters:

<?php
function imgsrc($img) {
    $src = $this->img("$img.png");
    $alt = $this->msg(ucfirst($img));
    $height = $this->msg("img-$src-height");
    $width = $this->msg("img-$src-width");
    return sprintf('<img src="%s" alt="%s" ' .
                   'height="%d" width="%d">', 
                   $src, $alt, $height, $width);
}
?>

To get the same results as the Cancel button example before, call it like this:

<?php
$MC = new pc_MC_es_US;

print $MC->imgsrc('cancel');
?>

Conclusion

With help of the msg() and img() methods, you can quickly create message objects that allow you to localize your Web site using 100 percent pure PHP. Because it's an all-PHP solution, you can reuse all your existing code, and you don't need to install any new extensions. However, if you need to share message catalogs among many applications, PHP supports gettext. See Joao Prado Maia's article for more details on using gettext with PHP.

As you can see, internationalizing your PHP applications is not a labor of Hercules. When you organize your localizations within an object hierarchy, it's easy to extend your classes to support new countries and regions without difficulties.

Adam Trachtenberg is a student at Columbia Business School and a coauthor of O'Reilly's PHP Cookbook.

Return to Related Articles from the O'Reilly Network .


Library Navigation Links

oreillynet.com Copyright © 2003 O'Reilly & Associates, Inc.