Lingua::EN::Sentence::Offsets - Finds sentence boundaries, and returns their offsets.
version 0.01_06
use Lingua::EN::Sentence::Offsets qw/get_offsets get_sentences/; my $offsets = get_offsets($text); ## Get the offsets. foreach my $o (@$offsets) { my $start = $o->[0]; my $length = $o->[1]-$o->[0]; my $sentence = substr($text,$start,$length) ## Get a sentence. # ... } ### or my $sentences = get_sentences($text); foreach my $sentence (@$sentences) { ## do something with $sentence }
Takes text input and returns reference to array containin pairs of character offsets, corresponding to the sentences start and end positions.
Takes text input and splits it into sentences.
user can add a list of acronyms/abbreviations.
get defined list of acronyms.
run over the predefined acronyms list with your own list.
Finds additional split points in the middle of previously defined sentences.
Minor adjusts to offsets (leading/trailing whitespace, etc)
First naive delimitation of sentences
Given a list of sentence boundaries offsets and a text, returns an array with the text split into sentences.
Based on the original module Lingua::EN::Sentence, from Shlomo Yona (SHLOMOY)
Lingua::EN::Sentence, Text::Sentence
Andre Santos <andrefs@cpan.org>
This software is copyright (c) 2012 by Andre Santos.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
To install Lingua::EN::Sentence::Offsets, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Lingua::EN::Sentence::Offsets
CPAN shell
perl -MCPAN -e shell install Lingua::EN::Sentence::Offsets
For more information on module installation, please visit the detailed CPAN module installation guide.