OCR::PerfectCR - Perfect OCR (if you have perfect input).
use OCR::PerfectCR; use GD; my $recognizer = OCR::PerfectCR->new; $recognizer->load_charmap_file("charmap"); my $image = GD::Image->new("example.png") or die "Can't open example.png: $!"; my $string = $recognizer->recognize($image); $recognizer->save_charmap_file("charmap");
OCR::PerfectCR is a fast, highly accurate "optical" character recognition engine requiring minimal training. How does it manage this, despite being written in pure perl? By ignoring most of the problems. OCR::PerfectCR requires that your input is in perfect shape -- that it hasn't gone into the real world and been scanned, that each image represent one line of text, and nothing else, and most difficultly, that the font have a fairly wide spacing. This makes it very useful for converting image-based subtitle formats to text, and probably not much else. However, it is very good at doing that.
OCR::PerfectCR's knowledge about a particular font is encapsulated in a "charmap" file, which maps md5 sums of the canonical representation of a character (the first 32 characters of the line) to a string (the 34th and onwards chars, to newline).
Most methods will die on error, rather then trying to recover and return undef.
Loads a charmap file into memory.
Saves the charmap to a file. Charmap files are always saved and loaded in utf8.
Takes the image (a GD::Image object), and tries to convert it into text. In list context, returns a list of hashrefs, each having a str key, whose value is the string in the charmap for that image. There may also be a color (note the spelling) key, with a value between 0 and 360, representing the color of the text in degrees on the color wheel, or undef meaning grey. The color being missing implies that there is nothing there but background -- that is, that it's whitespace. For non-whitespace characters, there is a key md5, which gives the md5 sum of the character in canonical form -- that is, it's charmap entry. Other keys are purposefully not documented -- if you find them useful, please let me know by filing an RT request.
str
color
undef
md5
Characters not in the charmap will have their str set to "\x{FFFD}" eq "\N{REPLACEMENT CHARACTER}", and will be added to the charmap. They will also be saved as png files named md5.png in the current directory, so that they a human can look at them and ID them.
"\x{FFFD}" eq "\N{REPLACEMENT CHARACTER}"
Just a boring constructor. No parameters.
Please report bugs on http://rt.cpan.org/. If the bug /might possibly/ be because of your input file, please include it with the bug report.
Copyright 2005 James Mastros, james@mastros.biz, JMASTROS, theorbtwo. (Those are all the same person.)
May be used and copied under the same terms as perl itself.
perl
Thanks, castaway, for being you, and diotalevi for a detailed review.
To install OCR::PerfectCR, copy and paste the appropriate command in to your terminal.
cpanm
cpanm OCR::PerfectCR
CPAN shell
perl -MCPAN -e shell install OCR::PerfectCR
For more information on module installation, please visit the detailed CPAN module installation guide.