Biblio::Document::Parser::Standard - document parsing functionality
use Biblio::Document::Parser::Standard; use Biblio::Document::Parser::Utils; # First read a file into an array of lines. my $content = Biblio::Document::Parser::Utils::get_content("http://www.foo.com/myfile.pdf"); my $doc_parser = new Biblio::Document::Parser::Standard(); my @references = $doc_parser->parse($content); # Print a list of the extracted references. foreach(@references) { print "-> $_\n"; }
Biblio::Document::Parser::Standard provides a fairly simple implementation of a system to extract references from documents.
Various styles of reference are supported, including numeric and indented, and documents with two columns are converted into single-column documents prior to parsing. This is a very experimental module, and still contains a few hard-coded constants that can probably be improved upon.
The new() method creates a new parser instance.
The parse() method takes a string as input (see the get_content() function in Biblio::Document::Parser::Utils for a way to obtain this), and returns a list of references in plain text suitable for passing to a CiteParser module.
- 2003/05/13 Removed Perl warnings generated from parse() by adding checks on the regexps
Mike Jewell <moj@ecs.soton.ac.uk> Tim Brody <tdb01r@ecs.soton.ac.uk>
To install Biblio::Document::Parser, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Biblio::Document::Parser
CPAN shell
perl -MCPAN -e shell install Biblio::Document::Parser
For more information on module installation, please visit the detailed CPAN module installation guide.