Search::WuManber -- A fast algorithm for multi-pattern searching
use Search::WuManber; my $search = Search::WuManber->new([qw(tribute reserved serve distribute)]); my @matches = $search->all( lc "All rights reserved. Distribute freely."); my $match = $search->first(lc "All rights reserved. Distribute freely."); $match = $search->next();
This module implements the Wu-Manber multi-pattern parallel search algorithm.
The list of search patterns passed to new() are prepared for parallel lookup. A perl reference pointing to all internal data is returned. Treat this reference as opaque. first() and next() iterate over all text positions where matches occur. Pattern matches in this context are substring matches. Each match is returned as a reference to a two-element array representing text offset and list index of the matching pattern. Options for new(): * return_string => 1 make the return value of the iterator a three-element array, containing also the search string itself. * case_sensitive => 0 run the search in case insensitive mode (slightly slower).
new()
first()
next()
return_string => 1
case_sensitive => 0
The matches are returned roughly sorted by offset. Offset usually increments, but may jump backwards by the length difference of neighbouring search strings.
New() allocates a constant amount of memory (between ca. 130k and 2MB). This memory can be returned by undef $search;
New()
undef $search;
The iterator is inefficient. first() just calls all() ...,
all()
This implementation uses internal hash-functions, which may not be optimal.
Efficiency of WuManber depends on the minimum length of search strings. Suggested minimum length is 5. new() switches to a slower algorithm if one of the strings has less than 3 characters.
Changes in the list of search patterns are not seen by the search algorithm after new() was called. Changes in the text are undefined. Call first() to restart with a new text.
Sun Wu and Udi Manber (1994) A fast algorithm for multi-pattern searching. Technical Report TR-94-17, University of Arizona. http://webglimpse.net/pubs/TR94-17.pdf
ftp://ftp.cs.arizona.edu/agrep/agrep-2.04.tar.Z
www.snort.org
Rolf Stiebe, Textalgorithmen, WS 2005/6
Text::Scan, Algorithm::AhoCorasick, Algorithm::RabinKarp
Text::Scan
Algorithm::AhoCorasick
Algorithm::RabinKarp
Juergen Weigert <jw@cs.fau.de>
Copyright (c) 2007-2011, Juergen Weigert, openSUSE.org This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself.
To install Search::WuManber, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Search::WuManber
CPAN shell
perl -MCPAN -e shell install Search::WuManber
For more information on module installation, please visit the detailed CPAN module installation guide.