Convert::Translit, transliterate, build_substitutes - Perl module for string conversion among numerous character sets
use Convert::Translit;
$translator = new Convert::Translit($result_chset); $translator = new Convert::Translit($orig_chset, $result_chset); $translator = new Convert::Translit($orig_chset, $result_chset, $verbose); $result_st = $translator->transliterate($orig_st); $result_st = Convert::Translit::transliterate($orig_st); build_substitutes Convert::Translit(); Convert::Translit::build_substitutes();
This module converts strings among 8-bit character sets defined by IETF RFC 1345 (about 128 sets). The RFC document is included so you can look up character set names and aliases; it's also read by the module when composing conversion maps. Failing functions or objects return undef value.
Export_OK Functions:
returns a string in $result_chset for an argument string in $orig_chset, transliterating by a map composed by new().
rebuilds the file "substitutes" containing character definitions and approximate substitutions used when a character in $orig_chset isn't defined in $result_chset. For example, "Latin capital A" may be substituted for "Latin capital A with ogonek". It takes a long time to rebuild this file, but you should never need to. Its only source of information is file "rfc1345".
Object methods:
creates a new object for converting from $orig_chset to $result_chset, these being names (or aliases) of 8-bit character sets defined in RFC 1345. If only one argument, then $orig_chset is assumed "ascii". If three arguments, the third is verbosity flag. Verbose output lists approximate substitutions and other compromises.
is same as the function of that name.
Convert/Translit/rfc1345 (IETF RFC 1345, June 1992) Convert/Translit/substitutes
Only one-to-one character mapping is done, so characters with diacritics (like A-ogonek) are never converted to (letter character, diacritic character) pairs, rather are subject to simplification. If no approximate substitute is available, then a unrelated substitute is chosen, preferably with the same code value. Undefined $orig_chset characters are translated to a chosen indicator character. Transliteration is not guaranteed commutative when substitutions were required. An $orig_chset defined as 7-bit is assumed to be repeated to make an 8-bit set (in the style of "extended ascii"); no such adjustment is made for $result_chset. The few mistakes in the RFC document are corrected in the module.
Convert Russian language text from IBM to ASCII encoding: $xxx = new Convert::Translit("EBCDIC-Cyrillic", "Cyrillic"); $ascii_cyr_st = $xxx->transliterate($ibm_cyr_st); Convert from plain ASCII (default $orig_chset) to Latin2 (Central European): $yyy = new Convert::Translit("Latin2"); $cnt_eur_st = $yyy->transliterate($ascii_st); Since plain ASCII is subset of Latin2, nothing is lost in transliteration. But going the other direction requires numerous simplifications: $zzz = new Convert::Translit("Latin2", "ascii"); $ascii_st = $zzz->transliterate($cnt_eur_st); Back to ASCII again, although substitutions probably mean ($again ne $cnt_eur_st): $again = $yyy->transliterate($ascii_st); The example.pl script converts a Polish language phrase from Latin2 to EBCDIC-US.
Requires Perl version 5. Developed with MacPerl on Macintosh 68040 OS 7.6.1. Tested on Sun Unix 4.1.3.
Genji Schmeder <genji@community.net>
Enjoy in good health. Cieszcie sie dobrym zdrowiem. Que gozen con salud. Benutze es heilsam gern! Genki dewa, yorokobi nasai.
Version 1.03 dated 5 November 1997. Copyright (c) 1997 Genji Schmeder. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Chris Leach, author of EBCDIC.pm Keld Simonsen, author of RFC 1345
To install Convert::Translit, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Convert::Translit
CPAN shell
perl -MCPAN -e shell install Convert::Translit
For more information on module installation, please visit the detailed CPAN module installation guide.