The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

OpenInteract2::Manual::I18N - Internationalization in OpenInteract2

SYNOPSIS

This part of the manual will describe i18n efforts in OpenInteract2, how to create message bundles to distribute with your application, and how you can customize the process.

CAVEATS

I'm a newbie at i18n/l10n efforts. The main purpose is to find the path I think most web applications will trod and make that as simple as possible to navigate. The hooks in the framework to enable localization should be sufficiently unobtrusive so as not to preclude other efforts you may have in this area.

So if you have ideas about how things can be done better or more flexibly, please join the openinteract-dev mailing list and chime in. (See "SEE ALSO" for more info on the mailing list.)

WRITING LOCALIZED APPLICATIONS

100% localization is hard

Localizing every aspect of your application is extremely difficult. There are the easy things like translating words on the screen, date/time formats and money. Then there are the tough things: what does this shade of yellow mean in China versus Saudi Arabia? What happens if someone reads this sequence of graphics from right-to-left instead of left-to-right? And on and on for many more items you couldn't have even thought up yet.

OpenInteract won't presume to take care of all these for you. Instead we try to make the most common operations as simple as possible. Hopefully that will be sufficient for your needs.

IDENTIFYING LANGUAGE TO USE

We have ways of learning about you...

Ordered from most to least important, here's how we identify the language to use for the current request. First match wins.

  • User logged in? Look in 'lang' user property

  • Language set in session?

  • Language in GET/POST params? ('oi_language')

  • Language passed by browser? (use as backup also...)

  • Customized identifiers (register in server.ini)

Custom language identifier

OI has more hooks than your favorite rock band, and this area is no exception. During the request initialization process we identify all the languages available for this request. Normally this means all the languages for a particular user, but you can override it with GET/POST parameters or a setting in the session.

We also provide the means for you to step in and implement your own -- you could parse it from the URL, use Geo::IP, whatever. Just declare your class in the server configuration key 'language.custom_language_id_class':

 [language]
 ...
 custom_language_id_class = MyApp::I18N::LanguageId

And implement the class method 'identify_languages()', which takes a single argument of the languages identified so far. Here's a naive example:

 package MyApp::I18N::LanguageId;
 
 use Geo::IP;
 use OpenInteract2::Context qw( CTX );
 
 my $gi = Geo::IP->new( GEOIP_STANDARD );
 
 sub identify_languages {
     my ( $class, @oi_langs ) = @_;
     my $country = $gi->country_code_by_addr( CTX->request->remote_host );
     my @langs_from_country = $class->_some_nifty_method( $country )
     push @oi_langs, @langs_from_country;
     return @oi_langs;
 }

Note that if you return a list with entries it replaces what OI has so far identified. We took care of this above by first copying all the languages previously identified then adding to them with the push.

SETTING UP LOCALIZAION IN YOUR PACKAGE

Type #1: Message replacement

This is the fairly simplistic means of using keys to represent blocks of text. The key gets replaced by the text for whatever language the current user is associated with. Here's an example: you setup your music library search form like this:

 Artist: _____________
 
 Title:  _____________
 
 Year:   _____________
 
                 <Search>

And you'd like to localize this. Like all other problems dealing with programming you just add a layer of abstraction, associating each piece of text with a key, then associating text to that key for each language (represented here by braces but that's not how they'd look in the template):

 {search.artist}: _____________
 
 {search.title}:  _____________
 
 {search.year}:   _____________
 
                 <{search.button}>

Now you just have sets of data for each language:

 en:
 search.artist = Artist
 search.title  = Title
 search.year   = Year
 search.button = Search
 
 es:
 search.artist = Artista
 search.title  = Titulo
 search.year   = Ano
 search.button = Hallazgo
 ...

When the page is rendered these keys get replaced by the associated text. Fortunately Perl comes with libraries to make this happen fairly painlessly. And a nice side-effect is that the message files are in a sufficiently simple format that you can ship them off to someone else and just plug them in your application when they're ready.

There's more about the messages and the file format below.

Type #2: Template negotation

A second type of localization is template negotiation. Hopefully you won't need to use it as often because it can require more maintenance. Instead of replacing text in the template you replace the entire template wholesale.

It works in much the same way, except instead of placing text in the various language files you place template names under a particular key. (The name is in the normal 'package::template' syntax.) And just like invoking a template from your action you can do this in two ways:

  1. specify the template in your action

  2. specify the template in your action configuration

Here's a quick example of the first, passing the message key in your action generate_content() call:

 sub mytask {
     my ( $self ) = @_
     my %params = ( ... );
     ...
     return $self->generate_content(
                     \%params, { message_key => 'mytask.template' } );
 }

And an example of the second, passing the message key in the action configuration (action.ini):

 [foo template_source]
 mytask = msg:mytask.template

In your message files you'd have:

 messages_en.msg:
 mytask.template = mypackage::mytemplatename_english

 messages_es.msg:
 mytask.template = mypackage::mytemplatename_spanish

The templates get the exact same data under the exact same variable names, but you can control the layout and text per language.

See OpenInteract2::Manual::Templates and OpenInteract2::Action for more information.

Signficance of Message Filenames

The names of the filenames we process are fairly flexible, but one aspect is not. The language must be the last distinct set of characters before the file extension. So the following are ok:

  myapp-en.msg         # lang is 'en'
  myotherapp-es-MX.dat #      ...'es-MX'
  messages_en-HK.msg   #      ...'en-HK'

The following are not:

 english-messages.msg
 messages-en-part2.msg
 messagesen.msg

If you create a message filename that does not conform to this specification, it not only won't be processed but will halt the entire localization reading process altogether.

Message File Format

The message file format is fairly simple:

skip comments and blanks

Unless we're in the middle of a continued value, we'll skip all commented lines (those beginning with a '#') and blank lines.

key/value pairs

A message key is unique per language and has a single value that is its associated message for that language. It is separated from the message by an '='.

continued values

A message value may span multiple lines using the standard '\' notation at the end of a line. (Examples below.)

runtime replacements

A message value may have one or more runtime replacements which match up with parameters passed in. These replacement declarations can get relatively sophisticated -- we discuss them briefly below but for true enlightenment read the documentation for Locale::Maketext.

So here is a simple declaration for two message keys without continued values or runtime replacements:

 company.title=Welcome to MyCompany!
 company.phone   =   Call 412-555-1212 for more information.

Two things to note:

  1. The keys ('company.title' and 'company.phone') are abstract and semi-hierarchical. There's a FAQ below about why we chose opaque message IDs for the core OI packages, but you don't have to do so. The only tricky part is ensuring you don't stomp on someone else's namespace. One way to do avoid this is using your package/application name as the first part of the hierarchy.

  2. The message reader will truncate any whitespace around the '='.

Continued Message Values

Here's a declaration of two keys, one of which has a continued value:

 company.intro = You have decided to learn about MyCompany, a leader \
     in the maintenance of the status quo around the world. Ensure your \
     status is the one that's in quo!
 company.title = Welcome to MyCompany!

The main things here are:

  1. The '\' must be at the end of the line or the remainder of your message will get lost. (You may have whitespace between the '\' and the end of line, but that may not be the case forever.)

  2. You can have multiple continuations for a single value. Leading space from successive lines will be lopped off.

  3. The value returned will not have any embedded newlines. (TODO: This may change, speak up if you have strong feelings about it.)

Runtime replacements

Since we just use Locale::Maketext behind the scenes you can do anything in your message values that it allows. Here is a quick summary of the most common options.

First, you often need to embed one or more values in a message. Position is important: the translation of your message may shift around the order of the values so you cannot treat it like a sprintf. For instance, you might have:

 db.error.process = While processing the statement [_1] the database \
 returned an error [_2]

In another language this might be something like the following nonsense:

 db.error.process = La base de datos volvio un error [_2] mientras \
 que procesaba la declaracion [_1] 

When we ask for the message we need to pass in two values which will get plugged into the message at runtime:

 my ( $sth );
 eval {
     $sth = $dbh->prepare( $sql );
     $sth->execute();
 };
 if ( $@ ) {
   my $error_msg = $lh->maketext( 'db.error.process', $sql, $@ );
   # ...
 }

Since they're ordered there's no ambiguity.

Second, you often need to plugin values that depending on their value may change words around them. For instance:

 cart.numitems = You have [_1] items in your shopping cart.

Easy enough, but what happens when the number is 1? Or 0?

 You have 1 items in your shopping cart.
 You have 0 items in your shopping cart.

It's understandable, but not user-friendly. Fortunately Locale::Maketext does this for us:

 cart.numitems = You have [quant,_1,item,items,no items]

With a '1' this will generate:

 You have 1 item in your shopping cart.

And with a '0':

 You have no items in your shopping cart.

Nifty!

USAGE

In Template Toolkit templates

Since the Template Toolkit is the preferred content generation system for OI it has the best support for fetching and displaying messages. Every template has the function 'MSG' in its namespace. This function takes a message key as the first argument and optional parameters as successive arguments. Each of these gets filled in the message in order. So you might have:

 [% MSG( 'search.results.count', results.size ) %]

Which refers to a message key:

 search.results.count = You found [_1] types of candy

The argument results.size will replace the [_1] placeholder when the message is interpreted.

Additionally, many of the OI template widgets take message keys as arguments in place of labels. For instance, instead of:

 [% INCLUDE header_row( labels = [ 'foo', 'bar', 'baz' ] ) %]

You can use:

 [% INCLUDE header_row( label_keys = [ 'label.foo', 'label.bar', 'label.baz' ] ) %]

Most of the time if the original argument was 'foo' the keyed argument will be 'foo_key', so:

 Old (and still works):
 [% INCLUDE label_form_text_row( label = 'Phone Number',
                                 name  = 'phone_number',
                                 size  = 20 ) %]
 [% INCLUDE form_button( value = 'Click Me!' ) %]
 
 New:
 [% INCLUDE label_form_text_row( label_key = 'myform.phone',
                                 name      = 'phone_number',
                                 size      = 20 ) %]
 [% INCLUDE form_button( value_key => 'global.button.click' ) %]

In code during a request

You can always grab a language handle from the OpenInteract2::Request object:

 my $lh = CTX->request->language_handle;

When first called during the request's lifetime this will determine what language the user is using and get a suitable Locale::Maketext handle. Successive calls during the request will return the same handle.

You can then call 'maketext' on the object and get a translation:

 my $lh = CTX->request->language_handle;
 my ( $sth );
 eval {
     $sth = $dbh->prepare( $sql );
     $sth->execute();
 };
 if ( $@ ) {
   my $error_msg = $lh->maketext( 'db.error.process', $sql, $@ );
   ...

OpenInteract2::Action subclasses have a shortcut with the _msg method. (The underscore is an indication that it's reserved for subclasses, a.k.a. 'protected' in other languages. The method itself does not enforce this.) So if the above were in an action it might look like this:

 sub do_something {
     my ( $self ) = @_;
     my ( $sth );
     eval {
         $sth = $dbh->prepare( $sql );
         $sth->execute();
     };
     if ( $@ ) {
         $self->param_add(
             error_msg => $self->_msg( 'db.error.process', $sql, $@ ) );
     ...

In code outside a request

Assuming that you started up the OpenInteract2::Context object in the normal fashion, you can just use the Locale::Maketext usage:

 my $lh = OpenInteract2::I18N->get_handle( 'lang', 'lang', 'lang'... );
 die $lh->maketext( 'db.error.process', $sql, $@ );

where each 'lang' is a user (or system) language in order of preference.

FAQ

Why did you use opaque IDs for the message keys?

In the Locale::Maketext docs Sean Burke recommends using keys based on the base language -- that is, not using opaque message keys. His suggestion makes for very readable translation documents but I think in practice it would be extremely brittle -- if you change the key in the base language even for punctuation you'll need to change all of them. Feh. (Then again, Mr. Burke is a bona-fide superhero, so we'll see how that shakes out...)

Additionally a lot of this was inspired by the message (or 'resource') bundle technology built in to the Java 2 platform. (See "SEE ALSO" for more on this.) Message bundles shipped with applications built on Struts or Spring typically use the hierarchical message syntax, with different levels separated by a dot. So you might have 'myapp.search.label.firstname' which gets more specific as you traverse the key from left to right. How specific you want to get is up to you.

That said, there's nothing stopping you from using your own standard for declaring keys in your application. Use ID numbers, letters, days of the week, whatever. Just make sure your package's keys don't trod on another's.

SEE ALSO

OpenInteract2::I18N

OpenInteract2::I18N::Initializer

Locale::Maketext

openinteract-dev mailing list:

http://lists.sourceforge.net/lists/listinfo/openinteract-dev

Article published in TPJ 13 by Sean Burke about Locale::Maketext:

http://search.cpan.org/~sburke/Locale-Maketext-1.06/lib/Locale/Maketext/TPJ13.pod

Web Localization in Perl by Autrijus Tang

http://www.autrijus.org/webl10n/TABLE_OF_CONTENTS.html

Java Internationalization: Localization with ResourceBundles

http://developer.java.sun.com/developer/technicalArticles/Intl/ResourceBundles/

COPYRIGHT

Copyright (c) 2003-2004 Chris Winters. All rights reserved.

AUTHORS

Chris Winters <chris@cwinters.com>