Text::Undiacritic - remove diacritics from a string
This document describes Text::Undiacritic 0.01
use Text::Undiacritic qw(undiacritic); $ascii_string = undiacritic( $czech_string );
Changes characters with diacritics into their base characters.
Also changes into base character in cases where UNICODE does not provide a decomposition.
E.g. all characters '... WITH STROKE' like 'LATIN SMALL LETTER L WITH STROKE' do not have a decomposition. In the latter case the result will be 'LATIN SMALL LETTER L'.
Removing diacritics is useful for matching text independent of spelling variants.
$ascii_string = undiacritic( $characters );
Removes diacritics from $characters and returns a simplified character string.
The input string must be in character modus, i.e. UNICODE code points.
version
charnames
Unicode::Normalize
There is no experience if this module gives useful results for scripts other than Latin.
Helmut Wollmersdorfer <WOLLMERS@cpan.org>
<WOLLMERS@cpan.org>
Copyright (c) 2007, Helmut Wollmersdorfer <WOLLMERS@cpan.org>. All rights reserved.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install Text::Undiacritic, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Text::Undiacritic
CPAN shell
perl -MCPAN -e shell install Text::Undiacritic
For more information on module installation, please visit the detailed CPAN module installation guide.