Bloom::Faster - Perl extension for the c library libbloom.
see INSTALL
use Bloom::Faster; # m = ideal vector size. # k = # of hash functions to use. my $bloom = new Bloom::Faster({m => 1000000,k => 5}); # this gives us very tight control of memory usage (a function of m) # and performance (a function of k). but in most applications, we won't # know the optimal values of either of these. for these cases, it is # much easier to supply: # # n = number of expected elements to check for duplicates, # e = acceptable error rate (probability of false positive) # # my $bloom = new Bloom::Faster({n => 1000000, e => 0.00001}); while (<>) { chomp; # Bloom::Faster->add() returns true when the value is a duplicate. if ($bloom->add($_)) { print "DUP: $_\n"; } } if ($bloom->check("foo")) { print " foo has been seen "; } # for annoying backwards-compatibility reasons, we also provide a "test" method. # this method is EQUIVALENT to the add() method and should not be used since it's # extremely confusing. This method is now deprecated. # serialize to disk $bloom->to_file("/path/to/file"); # read from disk my $another_bloom = new Bloom::Faster("/path/to/another/file"); # manually free the data structures $bloom->DESTROY;
Bloom filters are a lightweight duplicate detection algorithm proposed by Burton Bloom (http://portal.acm.org/citation.cfm?id=362692&dl=ACM&coll=portal), with applications in stream data processing, among others. Bloom filters are a very cool thing. Where occasional false positives are acceptable, bloom filters give us the ability to detect duplicates in a fast and resource-friendly manner.
The allocation of memory for the bit vector is handled in the c layer, but perl's oo capability handles the garbage collection. when a Bloom::Faster object goes out of scope, the vector pointed to by the c structure will be free()d. to manually do this, the DESTROY builtin method can be called.
A bloom filter perl module is currently avaible on CPAN, but it is slow and cannot handle large vectors. This alternative uses a more efficient c library which can handle very large vectors. =head2 EXPORT
None by default.
HASHCNT PRIME_SIZ SIZ
libbbloom.so
Peter Alvaro and Dmitriy Ryaboy, <palvaro@cpan.org> <dvryaboy@cpan.org>
Copyright (C) 2006-2009 by Peter Alvaro and Dmitriy Ryaboy
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.5 or, at your option, any later version of Perl 5 you may have available.
To install Bloom::Faster, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Bloom::Faster
CPAN shell
perl -MCPAN -e shell install Bloom::Faster
For more information on module installation, please visit the detailed CPAN module installation guide.