Locale::Country::Multilingual::Unicode - Recommended Usage with Unicode
version 0.25
use utf8; use Encode::StdIO; use Locale::Country::Multilingual {use_io_layer => 1}; my $lcm = Locale::Country::Multilingual->new; $lcm->set_lang('de'); print $lcm->code2country('gb'), "\n";
You are on a modern computer system, that uses utf-8 encoding by default. Locale::Country::Multilingual uses language data, that is in utf-8 too. Everything is fine.... Really?
utf-8
Try this in your favorite terminal:
> perl -le 'print "bäh!"' bäh!
Uppercase it:
> LANG=en_US perl -Mlocale -le 'print uc "bäh!"' BäH!
Wrong! It should have been BÄH!. Though on latin1 systems it works. Same for Österreich - the German and native name for Austria. If you run lc() on it, it won't change.
BÄH!
latin1
Österreich
Austria
lc()
What happened is, that you write files (and code) in utf-8, a multi-byte encoding, but Perl expects latin1 (iso-8859-1) by default, a single-byte encoding. Provided you use locale; together with an appropriate locale (here en_US) in your Perl program, a lowercase latin1 ä (0xe4) is turned into an uppercase Ä (0xc4) - but only if your input comes as latin1.
iso-8859-1
use locale;
en_US
ä
0xe4
Ä
0xc4
A utf-8 ä is encoded as 0xc3, 0xa4. Therefore uc() does not detect the two-byte ä as a letter that could be uppercased.
0xc3, 0xa4
uc()
Language files in Locale::Country::Multilingual are in utf-8.
Locale::Country::Multilingual
To make everything work the correct workflow is:
This pragma tells Perl, that all text in your code is actually in utf-8, so the Perl interpreter converts it into its internal string format correctly. Actually this is only necessary, when you have literals that contain non-ASCII characters, e.g. when you code:
print "Dürüm Döner Kebap\n";
Even if your system does not use utf-8 by default, your Perl programs should be encoded in utf-8. Use an editor where you can set the encoding.
By default Perl converts the internal string representation into latin1 for input and output. So the above print output would be broken on a non-latin1 system. For switching STDIN, STDOUT and STDERR to utf-8, you can write:
print
STDIN
STDOUT
STDERR
binmode STDIN, ':utf8'; binmode STDOUT, ':utf8'; binmode STDERR, ':utf8';
If your system uses another encoding, e.g. "euc-jp", you can switch a filehandle to that encoding with:
"euc-jp"
binmode FH, ':encoding(euc-jp)';
In a web application don't forget to set the output MIME type as well!
If output goes to a terminal:
use Encode::StdIO;
This module determines your terminal's encoding - even if it is something other than utf-8 - and sets the appropriate IO layers for the three standard IO handles.
use_io_layer => 1
There are two places where this option can be specified: Either in use or in new:
use
use Locale::Country::Multilingual {use_io_layer => 1}; my $lcm = Locale::Country::Multilingual->new( lang => 'de', use_io_layer => 1, ); print uc $lcm->code2country('gb'), "\n";
That should print
VEREINIGTES KÖNIGREICH GROSSBRITANNIEN UND NORDIRLAND
Wow! Even the "ß" has been converted correctly into "SS".
"ß"
"SS"
perluniintro, Encode::StdIO
Bernhard Graf graf(a)cpan,org
graf(a)cpan,org
This text is in the public domain.
Bernhard Graf <graf@cpan.org>
Fayland Lam <fayland@gmail.com>
Greg Oschwald <oschwald@cpan.org>
This software is copyright (c) 2014 by Fayland Lam.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
To install Locale::Country::Multilingual, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Locale::Country::Multilingual
CPAN shell
perl -MCPAN -e shell install Locale::Country::Multilingual
For more information on module installation, please visit the detailed CPAN module installation guide.