Published on the O'Reilly Network
(http://www.oreillynet.com/)
http://www.onlamp.com/pub/a/php/2002/11/28/php_i18n.html
See this
if you're having trouble printing code examples
Internationalization and Localization with PHP
by Adam Trachtenberg, coauthor of PHP
Cookbook
11/28/2002
While everyone who programs in PHP has to learn some English eventually
to get a handle on its function names and language constructs, PHP can
create applications in just about any human language. Some applications
need to be used by speakers of many different languages. PHP's internationalization
and localization support makes it easier to make an application written
for French speakers useful for German speakers.
Internationalization (often abbreviated I18N--there are 18 letters
between the first "i" and the last "n") is the process of taking an application
designed for just one locale and restructuring it so that it can be used in
many different locales. Localization (often abbreviated L10N--there are 10
letters between the first "l" and the "n") is the process of adding support
for a new locale to an internationalized application.
Localizing different kinds of content requires different techniques. This
article covers an object-oriented method for localizing plain text messages and
images. The PHP Cookbook
contains additional recipes for dates, times, and currency. There are also
recipes on using GNU gettext and other I18N and L10N topics.
Locales
A locale is a group of settings that describe text formatting and language
customs in a particular area of the world. A locale name generally has three
components. The first, an abbreviation that indicates a language, is mandatory.
For example, "en " stands for English and "pt " for Portuguese. An optional
country specifier comes next, after an underscore, to distinguish between
different versions of the same language spoken in different countries. For
example, "en_US " and "en_GB " specify U.S. and British English respectively,
while "pt_BR " and "pt_PT " identify Brazilian and Portugese Portuguese.
Finally, after a period, comes an optional character-set specifier. Taiwanese
Chinese using the Big5 character set is encoded as "zh_TW.Big5 ". Note that
while most locale names follow these conventions, some don't.
Message Catalog
To incorporate I18N support into your program, maintain a message catalog
of words and phrases and retrieve the appropriate string from the message
catalog before printing it. Here's a simple message catalog with foods in
American and British English and a function to retrieve words from the catalog:
<?php
$messages = array (
'en_US'=> array(
'My favorite foods are' =>
'My favorite foods are',
'french fries' => 'french fries',
'biscuit' => 'biscuit',
'candy' => 'candy',
'potato chips' => 'potato chips',
'cookie' => 'cookie',
'corn' => 'corn',
'eggplant' => 'eggplant'
),
'en_GB'=> array(
'My favorite foods are' =>
'My favourite foods are',
'french fries' => 'chips',
'biscuit' => 'scone',
'candy' => 'sweets',
'potato chips' => 'crisps',
'cookie' => 'biscuit',
'corn' => 'maize',
'eggplant' => 'aubergine'
)
);
function msg($s) {
global $LANG;
global $messages;
if (isset($messages[$LANG][$s])) {
return $messages[$LANG][$s];
} else {
error_log("l10n error:LANG:" .
"$lang,message:'$s'");
}
}
?>
This short program uses the message catalog to print out a list of foods:
<?php
$LANG ='en_GB';
print msg('My favorite foods are').":\n";
print msg('french fries')."\n";
print msg('potato chips')."\n";
print msg('corn')."\n";
print msg('candy')."\n";
?>
My favourite foods are:
chips
crisps
maize
sweets
To have the program output in American English instead of British English,
just set $LANG to en_US .
Variable Phrases
You can combine the msg() message retrieval function with
printf() to store phrases that require values to be substituted
into them. Consider the English sentence "I am 12 years old." In Spanish, the
corresponding phrase is "Tengo 12 años." The Spanish phrase can be built by
stitching together translations of "I am," the numeral 12, and "years old."
It's easier, though, to store them in the message catalogs as
printf() -style format strings:
<?php
$messages = array(
'en_US' => array(
'I am X years old.' =>
'I am %d years old.'),
'es_US' => array(
'I am X years old.' =>
'Tengo %d años.')
);
?>
You can then pass the results of msg() to
printf() as a format string:
<?php
$LANG ='es_US';
printf(msg('I am X years old.'), 12);
?>
Tengo 12 años.
For phrases that require the substituted values to be in a different order
in different languages, printf() supports changing the order of
the arguments:
<?php
$messages = array(
'en_US' => array(
'I am X years and Y months old.' =>
'I am %d years and %d months old.'),
'es_US' => array(
'I am X years and Y months old.'=>
'Tengo %2$d meses y %1$d años.')
);
?>
With either language, call sprintf() with the same order of
arguments (i.e., first years, then months):
<?php
$LANG ='es_US';
printf(msg('I am X years and Y months old.'),12,7);
?>
Tengo 7 meses y 12 años.
In the format string, %2$ tells printf() to use
the second argument, and %1$ tells it to use the first.
Message Objects
These phrases can also be stored as function return values instead of
strings in an array. Storing the phrases as functions removes the need to use
printf() . Functions that return a sentence look like this:
<?php
// English version
function i_am_X_years_old($age){
return "I am $age years old.";
}
// Spanish version
function i am_X_years_old($age){
return "Tengo $age años.";
}
?>
If some parts of the message catalog belong in an array, and some parts
belong in functions, an object is a helpful container for a language's message
catalog. A base object and two simple message catalogs look like this:
<?php
class pc_MC_Base {
var $messages;
var $lang;
function msg($s) {
if (isset($this->messages[$s])) {
return $this->messages[$s];
} else {
error_log("l10n error:LANG:" .
"$this->lang,message:'$s'");
}
}
}
class pc_MC_es_US extends pc_MC_Base {
function pc_MC_es_US() {
$this->lang ='es_US';
$this->messages = array(
'chicken' => 'pollo',
'cow' => 'vaca',
'horse' => 'caballo'
);
}
function i_am_X_years_old($age){
return "Tengo $age años";
}
}
class pc_MC_en_US extends pc_MC_Base {
function pc_MC_en_US() {
$this->lang ='en_US';
$this->messages = array(
'chicken' => 'chicken',
'cow' => 'cow',
'horse' => 'horse'
);
}
function i_am_X_years_old($age) {
return "I am $age years old.";
}
}
?>
Each message catalog object extends the pc_MC_Base class to get the
msg() method, and then defines its own messages (in its
constructor) and its own functions that return phrases. Here's how to print
text in Spanish:
<?php
$MC = new pc_MC_es_US;
print $MC->msg('cow');
print $MC->i_am_X_years_old(15);
?>
To print the same text in English, $MC just needs to be
instantiated as a pc_MC_en_US object instead of a pc_MC_es_US object. The rest
of the code remains unchanged.
Localizing Images
Images need to be localized when you want to display images containing text
in locale-appropriate languages.
Make an image directory for each locale you want to support, as well as a
global image directory for images that have no locale-specific information.
Create copies of each locale-specific image in the appropriate directories.
Make sure that these images have the same filenames. Instead of printing image
URLs directly, use a wrapper method similar to the msg() method
demonstrated earlier.
The img() wrapper method looks for a locale-specific version
of an image first, then a global one. If neither are present, it logs an error
message. Building upon the pc_MC_Base class, the new class looks like this:
<?php
class pc_MC_Base {
var $messages;
var $images;
var $lang;
var $image_base_path = '/usr/local/www/images';
var $image_base_url = '/images';
function msg($s) {
if (isset($this->messages[$s])) {
return $this->messages[$s];
} else {
error_log("l10n error:LANG:" .
"$this->lang,message:'$s'");
}
}
function img($f) {
if (is_readable("$this->image_base_path/" .
"$this->lang/$f")) {
print "$this->image_base_url/$this->lang/$f";
} elseif (is_readable("$this->image_base_path/" .
"global/$f")) {
print "$this->image_base_url/global/$f";
} else {
error_log("l10n error:LANG:" .
"$this->lang,image:'$f'");
}
}
}
?>
The img() method needs to know both the path to the image file
in the filesystem ($image_base_path ) and the path to the image
from the base URL of your site ($image_base_url ). It uses the
first to test if the file can be read and the second to construct an
appropriate URL for the image.
A localized image must have the same filename in each localization
directory. For example, an image that says "New!" on a yellow starburst should
be called new.gif in both the images/en_US directory
and the images/es_US directory, even though the file
images/es_US/new.gif is a picture of a yellow starburst with the
word "¡Nuevo!" on it. Don't forget that the alt text you display in your image
tags also needs to be localized. A complete localized <img> tag
looks like this:
<?php
$MC = new pc_MC_es_US;
printf('<img src="%s" alt="%s">',
$MC->img('cancel.png'), $MC->msg('Cancel'));
?>
If the localized versions of a particular image have varied dimensions,
store image height and width in the message catalog as well:
<?php
printf('<img src="%s" alt="%s" ' .
'height="%d" width="%d">',
$MC->img('cancel.png'), $MC->msg('Cancel'),
$MC->msg('img-cancel-height'),
$MC->msg('img-cancel-width'));
?>
The localized messages for img-cancel-height and img-cancel-width are not
text strings, but integers that describe the dimensions of the
cancel.png image in each locale.
If you use a consistent naming convention for your variable and file names,
create an imgsrc() method to simplify matters:
<?php
function imgsrc($img) {
$src = $this->img("$img.png");
$alt = $this->msg(ucfirst($img));
$height = $this->msg("img-$src-height");
$width = $this->msg("img-$src-width");
return sprintf('<img src="%s" alt="%s" ' .
'height="%d" width="%d">',
$src, $alt, $height, $width);
}
?>
To get the same results as the Cancel button example before, call it like
this:
<?php
$MC = new pc_MC_es_US;
print $MC->imgsrc('cancel');
?>
Conclusion
With help of the msg() and img() methods, you can
quickly create message objects that allow you to localize your Web site using
100 percent pure PHP. Because it's an all-PHP solution, you can reuse all your
existing code, and you don't need to install any new extensions. However, if you
need to share message catalogs among many applications, PHP supports gettext.
See Joao Prado
Maia's article for more details on using gettext with PHP.
As you can see, internationalizing your PHP applications is not a labor of
Hercules. When you organize your localizations within an object hierarchy, it's
easy to extend your classes to support new countries and regions without
difficulties.
Adam Trachtenberg
is a student at Columbia Business School and a coauthor of O'Reilly's PHP Cookbook.
Return to Related Articles from the O'Reilly Network .
oreillynet.com Copyright © 2003 O'Reilly & Associates, Inc. |