The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Locale::MakePhrase::RuleManager - Language rule sort and evaluation object.

SYNOPSIS

The Locale::MakePhrase module uses this plugin module, to implement the evaluation and sorting phases of text selection. It explains the rule expression syntax and evaluation procedure.

It sorts the language rules into a suitable order so that we can figure out which rule to select, ie. the aim is to sort the rules into an order so that we can select the first rule.

It evaluates the program arguments against the expressions in the rule.

PHRASE SYNTAX

To allow an argument to be placed within the middle of a string, we use square brackets as the notation, plus an underscore, then the argument number, as in:

  "Please select [_1] files"

where [_1] refers to the first program variable supplied as an argument with the text to be translated.

To display square brackets within the text string, you will need to escape the square bracket by using the ~ (tilde) character, as in:

  "This is ~[ bracketed text ~]"

this will print:

  This is [ bracketed text ]

Of course, if you need to display the ~ character, you will need to use two of them, as in:

  "Tilde needs escaping as in ~~"

which ends up printing:

  Tilde needs escaping as in ~

LINGUISTIC RULES

We have coined the term linguistic rules as a means to describe the technique which decides which piece of text is displayed, for a given input text phrase and any/all program arguments.

To understand why we need to generate linguistic rules, consider the 'singular vs. plural' example shown in the "REQUIREMENTS" in Locale::MakePhrase section.

In this example, we needed four different text strings, for the trivial case of what to display for a given program value.

For other examples, the URL's mentioned in that section describe why there is a need for applying rules on a per-language basis (they also describe why most current language translation systems fail).

What is a linguistic rule?

A linguistic rule is the evaluation of the context of a phrase by using program arguments, for a given program string. The arguments are evaluated left-to-right and top-to-bottom. The first rule to succeed has its corresponding translated text applied in-place of the input text.

Note that if a program string takes no arguments, the rule becomes rather simplistic (in that no arguments need to be evaluated).

Rules can be tested in a number of ways. The 'Operators' and 'Functions' sections list the available rule expression conjigates available for use within each rule expression.

Previously we mentioned that the language translation system used syntax with the form [_1]. You will notice that we use an underscore in the placeholder. This may appear to be meaningless, but as we will see, we use this rule property to help understand how rules are evaluated.

Numeric evaluation

Let's show an example of a simple expression:

  _1 == 0

The use of the underscore signifies that this value is to be classified as an argument number and is not to be treated literally. This expression says, 'Does the first argument have a value equal to zero?'

[Note that we use double-equals; the double-equals operator will use a numeric context in the equality test.]

String evaluation

Since an argument can also be a string, we could define an expression to be:

  _1 eq "some text"

Notice that we use a different operator depending on whether the argument is numeric or a string. This is because we need to be able to figure out what context the argument needs to be evaluated in.

[In this case we use 'eq' as the text context equality operator.]

Alternate argument representation

In some cases we need to be able to specify the translated string, based on an alternate representation of the argument. This is handled by using a function. For example, you may use the term 'houses', which is the main keyword within your application.

To handle alternations of the word 'houses' (such as 'house') we can define an expression of:

  left(_1,5) eq 'house'

However, in some cases we will use the terms 'apartments' or 'flats'. In these cases, we only care if the value is in the plural or singular case:

  right(_1,1) eq "s"

Thus, we are provided with a set of functions which allow some manipulation of the argument value, prior to evaluation.

Multiple arguments

In many cases, more than one argument is supplied (as well as the text to translate) to Locale::MakePhrase. In those cases, an expression can be created which tests each argument, as in:

  _1 == 0 && _2 != 1

As we can see here, by using &&, we combine multiple expression evaluations, into a single rule expression. In this case the expression is effectively saying "if argument one is equal to zero AND argument two is not equal to one".

We support an unlimited number of arguments within the expression evaluation capability.

Multiple rule expressions

Consider the following exresssions:

  _1 > 0   produces the output "Lots of files"
  _1 == 0  produces "No files"
  _1 == 1  produces "One file"

Each expression is a valid, but if we evaluate this set of expressions in the wrong order, we will never be able to produce the text "One file" as the _1 > 0 expression would evaluate to true, before we try to evaluate the _1 == 1 expression.

To counter this problem, whenever we define a rule expression (including when there is no rule expression as would be the case when no arguments are supplied), we must also define a rule priority (where a larger number gives higher priority).

Knowing this, let's re-consider the previous set of expressions, this time adding a suitable priority of evaluation for each expression:

  expression:  priority:
    _1 > 1        1
    _1 == 0       2
    _1 == 1       2

Now that we have a rule priority, we can see that the _1 == 0 expression and the _1 == 1 expression will get evaluated before the _1 > 1 expression.

You will notice that two rules have the same priority (i.e. we can have any number of rules having the same priority); in this case, the rules are evaluated in a non-deterministic (first found, first evaluated) manner. Thus it is important to make sure that a given rule expression, has a valid rule priority, for the rule set.

Rule Syntax

Now that we know what a linguistic rule is, we need to explain some minor but important points.

Each symbol in a rule expression needs to be separated with a space, i.e. this works:

  _1 > 4
  left(_2,1) eq "f"

this doesn't:

  _1>4
  left(_2,1)eq"f"

Whenever we are using a string operator, we must enquote the value that we are testing, i.e. this works:

  _1 eq "fred"

this doesn't:

  _1 eq fred

We support single and double quote characters, including mixed quoting (for simplistic cases), i.e. these work:

  _1 eq "some text"
  _1 eq 'some text'
  _1 eq "someone's dog"
  _1 eq '"john spoke"'

this doesn't (i.e. there is no quote 'escape' capability):

  _1 eq "\"something\""

Note that expressions are not unary, as in (this checks if the first argument has any length):

  length(_1)

rather, they should look like:

  length(_1) > 0

APPLYING RULES TO LANGUAGES

CAVEAT:

The following description of rule evaluation is correct at the time of writing. However, as this module evolves, we may alter the implementation as we get feedback. If you have used this module and found that the rule evaluation order is not what you expect, please contact the maintainer.

So far we have discussed the concept that, a translation exists for a language/dialect combination. However, the application may not be translated into the specific language requested by the user. In these cases, Locale::MakePhrase tries to use fallback languages as the source language for this translation request. This allows languages derived from other base languages (eg Spanish and Mexican share common words) and dialect specific variations of languages (such as variations of English), to use the parent language as a source for possible translations.

Thus whenever a phrase cannot be translated directly into the requested localisation, Locale::MakePhrase will use a fallback mechanisn for the input phrase.

Also, to support variations in output text which can exist in locale-specific translations, non-expression rules should be evaluated after rules which have an expression.

The implementation of which rule to select, has been abstracted into a seperate module so that you can implement your own process of which rule is selected based on the available rules. The default implementation is defined in Locale::MakePhrase::RuleManager. It contains a description of the current implementation.

Overview of steps to rule evaluation

  1. Locale::MakePhrase generates a list of rules which are applicable for required languages (plus any fallback languages).

  2. The rules are sorted by the Locale::MakePhrase::RuleManager module.

  3. Each expression from the sorted rules are evaluated. If the rule succeeds, the corresponding text is returned. If not, the next rule is evaluated.

  4. If finally no match is found, the input string is used as the output string.

  5. Any arguments are then applied to the output string.

Example rule definitions

Shown below are examples of various rules; some rules have no expressions and/or arguments; all rules must have at least a priority of zero or more.

Rule 1:
 Language:    en_US
 Input text:  Please select some colours.
 Expression:  (none)
 Priority:    0
 Output text: Please select some colors.
Rule 2:
 Language:    en
 Input text:  Please select some colours.
 Expression:  (none)
 Priority:    0
 Output text: Please select colours.
Rule 3:
 Language:    en_AU
 Input text:  Please select [_1] colours.
 Expression:  (none)
 Priority:    0
 Output text: Please select [_1] colours.
Rule 4:
 Language:    en
 Input text:  Please select [_1] colours.
 Expression:  _1 > 0
 Priority:    0
 Output text: Select [_1] colours.
An example:

Given that the preferred language is 'en_US', if you compare rule 1 vs rule 2, the linguistic rule evaluation mechanism will be applied to rule 1 before being applied to rule 2, as it has a higher language-order.

A further example:

Compare rule 3 vs rule 4. Given that there is no expression associated with rule 3, but that the 'en' version does have an expression, rule 4 will be evaluated (and found to be true in some cases) before example 3 is evaluated.

These examples show that it is important to consider the interactions of the linguistic rules, as they are applied to the current localisation.

APPLYING ARGUMENTS TO TEXT

With any text translation system, there comes a time when it is necessary to apply the values of the arguments 'in situ', replacing the square-bracket argument number, with the corresponding argument value, so that the output will say something useful. This happens after all rules have been applied (if there were any), and after the output text string has been chosen.

For example:

  Input text:  "Selected [_2] files, [_1] directories"
  Arguments:   3  21

Apply rules...

  Rule text:   "Selected [_2] files, [_1] directories"
  Output text: "Selected 21 files, 3 directories"

OPERATORS

This is a list of all operators:

 Operator  Context      Meaning                Example
 ----------------------------------------------------------------------
   ==      Numeric  Equal to                   _1 == 4
   !=      Numeric  Not equal to               _1 != 2
   >       Numeric  Greater than               _2 > 1
   <       Numeric  Less than                  _1 < 7
   >=      Numeric  Less than or equal to      _4 >= 21
   <=      Numeric  Greater than or equal to   _3 <= 12
   eq      String   Equal to                   _1 eq "some text"
   ne      String   Not equal to               _2 ne "something else"

FUNCTIONS

This is a list of available functions:

  Function     Context              Meaning                  Example
 ----------------------------------------------------------------------
 defined(x)       -      Is the argument defined/not-null,  defined(_1)
                         returns 0 or 1
 length(x)        -      Length of value of the argument,   length(_1)
                         returns an integer >= 0
 abs(n)         Number   Numerical absolute of argument     abs(_3)
 lc(s)          String   Lowercase version                  lc(_1)
 uc(s)          String   Uppercase version                  uc(_2)
 left(s,n)      String   LHS of argument from start         left(_3,4)
 right(s,n)     String   RHS of argument from end           right(_1,2)
 substr(s,n)    String   RHS of argument from start         substr(_2,7)
 substr(s,n,l)  String   Sub-part of argument from 'n',     substr(_2,7,4)
                         up to 'l' characters

API

The following functions are used by the Locale::MakePhrase class. By sub-classing this module, then overloading these functions, Locale::MakePhrase can use yor custom RuleManager module.

new()

Construct a new instance of Locale::MakePhrase::RuleManager object; arguments are passed to the init() method.

$self init([..])

Allow sub-class a chance to control construction of the object. You must return a reference to $self, to 'allow' the construction to complete (should you decide to derive from it).

boolean evaluate($expression,@arguments)

This is the expression evaluation engine. It takes an expression as described above (for example _1 == 4 && _2 eq 'fred'). It then takes any program arguments, applying them in-place of the _X place holders. Finally returning true / false, based on the result of the evaluation of the expression.

\@rules sort(\@rule_objects,\@languages)

The guts of the sorter; by subclassing this module, you can implement your own sorting routine.

This module implements the following rules for deciding the sorted order of the rules. The aim is to return a list which can be evaluated in-order.

  1. Rules are sorted histest to lowest priority, for the primary language, for rules which have expressions.

  2. The next available fallback language is chosen as the language to use; step 1 is repeated.

  3. This process continues until no further fallback languages are available.

  4. The non-expression rules are then evaluated according to the preferred language.

  5. If that fails, the fallback languages are tried. This continues for each fallback language.

$string apply_arguments($makephrase,$translation,@arguments)

This applies any/all arguments, to the outgoing text phrase; if the argument is text, it (optionally) undergoes the translation process; if the argument is numeric, it is formatted by the Locale::MakePhrase format_number method.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 758:

=cut found outside a pod block. Skipping to next block.