The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Search::WuManber -- A fast algorithm for multi-pattern searching

SYNOPSIS

  use Search::WuManber;

  my $search = Search::WuManber->new([qw(tribute reserved serve distribute)]);
  my @matches = $search->all(  lc "All rights reserved. Distribute freely.");
  my $match =   $search->first(lc "All rights reserved. Distribute freely.");
  $match = $search->next();

DESCRIPTION

This module implements the Wu-Manber multi-pattern parallel search algorithm.

The list of search patterns passed to new() are prepared for parallel lookup. A perl reference pointing to all internal data is returned. Treat this reference as opaque. first() and next() iterate over all text positions where matches occur. Pattern matches in this context are substring matches. Each match is returned as a reference to a two-element array representing text offset and list index of the matching pattern. Options for new(): * return_string => 1 make the return value of the iterator a three-element array, containing also the search string itself. * case_sensitive => 0 run the search in case insensitive mode (slightly slower).

The matches are returned roughly sorted by offset. Offset usually increments, but may jump backwards by the length difference of neighbouring search strings.

New() allocates a constant amount of memory (between ca. 130k and 2MB). This memory can be returned by undef $search;

BUGS

The iterator is inefficient. first() just calls all() ...,

This implementation uses internal hash-functions, which may not be optimal.

Efficiency of WuManber depends on the minimum length of search strings. Suggested minimum length is 5. new() switches to a slower algorithm if one of the strings has less than 3 characters.

Changes in the list of search patterns are not seen by the search algorithm after new() was called. Changes in the text are undefined. Call first() to restart with a new text.

REFERENCES

Sun Wu and Udi Manber (1994) A fast algorithm for multi-pattern searching. Technical Report TR-94-17, University of Arizona. http://webglimpse.net/pubs/TR94-17.pdf

ftp://ftp.cs.arizona.edu/agrep/agrep-2.04.tar.Z

www.snort.org

Rolf Stiebe, Textalgorithmen, WS 2005/6

SEE ALSO

Text::Scan, Algorithm::AhoCorasick, Algorithm::RabinKarp

AUTHOR

Juergen Weigert <jw@cs.fau.de>

COPYRIGHT AND LEGALESE

Copyright (c) 2007-2011, Juergen Weigert, openSUSE.org This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself.