The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Roman::Unicode - Make roman numerals, using the Unicode characters for them

SYNOPSIS

        use Roman::Unicode qw( to_roman is_roman to_perl );

        my $perl_number  = to_perl( $roman ) if is_roman( $roman );
        my $roman_number = to_roman( $arabic );

DESCRIPTION

I made this module as a way to demonstrate various Unicode things without mixing up natural language stuff. Surprisingly, roman numerals can do quite a bit with that. You'll have to read the source to see it in action.

There are many fancy characters in this documentation, so you need a good font that has the right glyphs. The Symbola font is a good one: http://users.teilar.gr/~g1951d/

Functions

is_roman( STRING )

Returns true if the string looks like a valid roman numeral. This works with either the ASCII version or the ones using the characters in the U+2160 to U+2188 range. You cannot mix the uppercase and lowercase numerals.

to_perl( ROMAN )

If the argument is a valid roman numeral, to_perl returns the Perl number. Otherwise, it returns nothing.

to_roman( PERL_NUMBER )

If the argument is a valid Perl number, even if it is a string, to_roman returns the roman numeral representation. This uses the characters in the U+2160 to U+2188 range.

If the number cannot be represented as roman numerals, this returns nothing. Note that 0 doesn't have a roman numeral representation.

If you want the lowercase version, you can use lc on the result. However, some of the roman numerals don't have lowercase versions.

to_ascii( ROMAN )

If the argument is a valid roman numeral, it returns an ASCII representation of it. Most of the numeral code points have compatible decompositions, so the first step uses NFKD decomposition. For other characters, it uses ASCII art representations:

        Roman       ASCII art
        ------      ----------
        ↁ           |)
        ↂ          ((|))
        ↈ          (((|)))
        ↇ           |))

Case mapping

As a demonstration of case mapping, I supply one function that uses Unicode::Casing. You can lexically override the case-mapping functions as described in that module's documentation.

to_roman_lower

A subroutine you can use with Unicode::Casing. It's a bit more special because it turns the higher magnitude characters into ASCII versions. That means that the return value might not be a valid according to is_roman. It returns nothing if the input isn't a valid Roman numeral string.

You can also use this as a stand-alone function instead of lc. That's the smart way to do it, but then you don't get to play with Unicode::Casing.

User-defined properties

Perl lets you define your own properties, as documented in perlunicode. This module defines several.

IsRoman

The IsRoman property is a combination of IsUppercaseRoman and IsLowercaseRoman.

IsUppercaseRoman

The IsUppercaseRoman property matches these code points:

        Ⅰ       U+2160      ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ᴏɴᴇ
        Ⅴ       U+2164      ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ꜰɪᴠᴇ
        Ⅹ       U+2169      ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ᴛᴇɴ
        Ⅼ       U+216C      ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ꜰɪꜰᴛʏ
        Ⅽ       U+216D      ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ᴏɴᴇ ʜᴜɴᴅʀᴇᴅ
        Ⅾ       U+216E      ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ꜰɪᴠᴇ ʜᴜɴᴅʀᴇᴅ
        Ⅿ       U+216F      ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ᴏɴᴇ ᴛʜᴏᴜsᴀɴᴅ
        ↁ       U+2181      ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ꜰɪᴠᴇ ᴛʜᴏᴜsᴀɴᴅ
        ↂ      U+2182      ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ᴛᴇɴ ᴛʜᴏᴜsᴀɴᴅ
        ↇ       U+2187      ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ꜰɪꜰᴛʏ ᴛʜᴏᴜsᴀɴᴅ
        ↈ      U+2188      ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ᴏɴᴇ ʜᴜɴᴅʀᴇᴅ ᴛʜᴏᴜsᴀɴᴅ

This excludes the other Roman numeral code points, such as Ⅻ (U+216B, ʀᴏᴍᴀɴ ɴᴜᴍᴇʀᴀʟ ᴛᴡᴇʟᴠᴇ) since they are not designed to be part of larger strings of Roman numerals.

IsLowercaseRoman

The IsLowercaseRoman is the set of lowercase code points derived from the set of code points in IsUppercaseRoman. It checks each code point in IsUppercaseRoman and checks the Unicode Character Database (UCD) through Unicode::UCD to see if it has a lowercase mapping. If there is a lowercase mapping, it makes it part of this property.

LIMITATIONS

By using just the defined roman numerals characters in the Unicode Character Set, you're limited to numbers less than 400,000 (although you could make ↈↈↈↈ if you wanted, since that's not unheard of).

AUTHOR

brian d foy <brian.d.foy@gmail.com>

This module started with the Roman module, credited to:

OZAWA Sakuro <ozawa at aisoft.co.jp> 1995-1997

Alexandr Ciornii, <alexchorny at gmail.com> 2007

COPYRIGHT

Copyright © 2011-2022, brian d foy <bdfoy@cpan.org>.

You can use this module under the terms of Artistic License 2.0.