Encode::Escape::Unicode - Perl extension for Encoding of Unicode Escape Sequnces
use Encode::Escape::Unicode; $escaped = "What is \\x{D384}? It's Perl!"; $string = decode 'unicode-escape', $escaped; # Now, $string is equivalent "What is \x{D384}? It's Perl!" Encode::Escape::Unicode->demode('python'); $python_unicode_escape = "And \\u041f\\u0435\\u0440\\u043b? It's Perl, too."; $string = decode 'unicode-escape', $python_unicode_escape; # Now, $string eq "And \x{041F}\x{0435}\x{0440}\x{043B}? It's Perl, too."
If you have a text data file 'unicode-escape.txt'. It contains a line:
What is \x{D384}? It's Perl!\n And \x{041F}\x{0435}\x{0440}\x{043B}? It's Perl, too.\n
And you want to use it as if it were a normal double quote string in source code. Try this:
use Encode::Escape::Unicode; open(FILE, 'unicode-escape.txt'); while(<FILE>) { chomp; print encode 'utf8', decode 'unicode-escape', $_; }
Encode::Escape::Unicode module implements encodings of escape sequences.
Simply saying, it converts (decodes) escape sequences into Perl internal string (\x{0000} -- \x{ffff}) and encodes Perl strings to escape sequences.
Escape Sequcnes Description --------------- -------------------------- \a Alarm (beep) \b Backspace \e Escape \f Formfeed \n Newline \r Carriage return \t Tab \000 - \377 octal ASCII value. \0, \00, and \000 are equivalent. \x00 - \xff hexadecimal ASCII value. \x0 and \x00 are equivalent. \x{0000} - \x{ffff} hexadecimal ASCII value. \x{0}, \x{00}, x\{000}, \x{0000} \\ Backslash \$ Dollar Sign \@ Ampersand \" Print double quotes \ Escape next character if known otherwise print
This is the default mode. You don't need to invoke it since you haven't invoke other mode previously.
Python, Java, and C# languages use \uxxxx escape sequence for Unicode character.
\u
Escape Sequcnes Description --------------- -------------------------- \a Alarm (beep) \b Backspace \e Escape \f Formfeed \n Newline \r Carriage return \t Tab \000 - \377 octal ASCII value. \0, \00, and \000 are equivalent. \x00 - \xff hexadecimal ASCII value. \x0 and \x00 are equivalent. \u0000 - \uffff hexadecimal ASCII value. \\ Backslash \$ Dollar Sign \@ Ampersand \" Print double quotes \ Escape next character if known otherwise print
If you have data which contains \uxxxx escape sequences, this will translate them to utf8-encoded characters:
use Encode::Escape; Encode::Escape::demode 'unicode-escape', 'python'; while(<>) { chomp; print encode 'utf8', decode 'unicode-escape', $_; }
And this will translate \uxxxx to \x{xxxx}.
\x{
}
use Encode::Escape; Encode::Escape::enmode 'unicode-escape', 'perl'; Encode::Escape::demode 'unicode-escape', 'python'; while(<>) { chomp; print encode 'unicode-escape', decode 'unicode-escape', $_; }
See Encode::Escape.
you, <you at cpan dot org>
Copyright (C) 2007 by you
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.
To install Encode::Escape, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Encode::Escape
CPAN shell
perl -MCPAN -e shell install Encode::Escape
For more information on module installation, please visit the detailed CPAN module installation guide.