The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

WordLists::Common

SYNOPSIS

        use WordLists::Common qw(pretty_doubles pretty_singles);
        print pretty_doubles (pretty_singles (
                        qq{"That's right," she said, "I was told to 'get lost!'".}
                ) );
        

DESCRIPTION

This provides common functions and values of relevance to wordlists - such as normalising parts of speech and typographic dashes and quotes. Exportable functions and values include:

  • @sPosWords, a list of things which look like parts of speech (to help parsing things like "head verb", "head up", "head noun")

  • A function pretty_endash replacing space + hyphen + space with space + en-dash + space.

  • A function pretty_doubles replacing double quotes with 'smart' double quotes.

  • A function pretty_singles replacing apostrophe/single-quote with 'smart' single quotes.

  • A function norm_spacing

  • A function custom_norm which takes several options:

    • lc - if true, lowercases the string.

    • uc - if true, uppercases the string. Overrides lc.

    • trim_space - if true, removes initial and final space, and also condenses repeating white space to a single \x20.

    • alnum_only - if true, removes characters other than alphabetic ones or digits.

    • brackets - if this is 'kill', removes the contents of any () brackets; if 'ignore', removes the brackets themselves.

    • squares - if this is 'kill', removes the contents of any [] brackets; if 'ignore', removes the brackets themselves.

    • accents - if true, removes accents and modifier characters from letters.

    • sb - if true, replaces 'sb' with 'someone'.

    • sth - if true, replaces 'sth' with 'something'.

  • A function generic_norm_hw which returns a word without accents or characters other than [a-z0-9].

  • A function generic_norm_pos for normalising parts of speech so that 'v' and 'verb' match.

  • A function generic_minimal_pos which will normalise parts of speech and reduce them to 'minimal' ones.

  • A function uniques which will reduce a list to the unique members.

BUGS

Please use the Github issues tracker.

LICENSE

Copyright 2011-2012 © Cambridge University Press. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.