KeywordsSpider::Core - core for web spider searching for keywords
use KeywordsSpider::Core; my $spider = KeywordsSpider::Core->new( output_file => $opened_filehandle, links => \%links, keywords => \@keywords, allowed_keywords => \%allowed_keywords, debug_enabled => 1, web_depth => 5, );
KeywordsSpider::Core is core for web spider, which spiders links, and matches their content against keywords. Keyword trigger ALERT to output_file. Allowed keywords do not trigger ALERT.
Websites are defined by 'want_spider' parameter in the links hash. The are spidered to 'web_depth' (default 3), and links in their content are added to links hash. Other links are just checked for keywords, no spidering.
opened file handle
array of keywords you want to find
hash of keywords which do not trigger ALERT. Like:
my %allowed_keywords = ( wuord1 => 1, );
websites and referer urls you want to spider. Like:
my %links = ( 'http://website.sk' => { 'want_spider' => 1, 'depth' => 0, }, 'http://referer.sk' => { 'depth' => 0, }, );
note, that links hash is changed, when running the spider
prints debug messages to standard output
depth to which website will be scanned. Default is 3.
main method
makes necessary settings to spider website
scans website according to settings
checks if url's content matches keywords
add links in url's content to links hash
if debug enabled, prints string to standard output
SPIDER http://domain.sk this IS NOT counted as alerted ---------------------------------------------------------------------- SPIDER LINKS SPIDER http://trololo.sk ERROR:404 Not Found this IS NOT counted as alerted SPIDER LINKS SPIDER http://domain.sk/old.html possible bad content http://domain.sk/old.html word2 found keywords: 1 fetching http://domain.sk/new.html ALERT possible bad content http://domain.sk/new.html wuord1 word2 found keywords: 2 fetching http://domain.sk/lala.txt SKIPPING because of content type or length SPIDER http://domain.sk this IS counted as alerted
KeywordsSpider -- takes files as arguments and prepares attributes for KeywordsSpider::Core
Copyright 2013 Katarina Durechova
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install KeywordsSpider, copy and paste the appropriate command in to your terminal.
cpanm
cpanm KeywordsSpider
CPAN shell
perl -MCPAN -e shell install KeywordsSpider
For more information on module installation, please visit the detailed CPAN module installation guide.