dezibot - parallel web crawler
# crawl 2 sites % dezibot http://dezi.org http://swish-e.org # crawl a list of sites % dezibot --urls file_with_urls # pass in stored config % dezibot --config botconfig.pl # crawl in parallel % dezibot --workers 5 --urls file_with_urls
dezibot is a command line tool wrapping the Dezi::Bot module.
dezibot can:
read from a config file or take options on the command line
read URLs from a file or from @ARGV
spawn multiple parallel spiders
The following options are supported.
Print this message.
Spew lots of information to stderr. Overrides any setting in --config.
Print some status information to stderr. Overrides any setting in --config.
Read config from file using Config::Any. The parsed config is passed directly to Dezi::Bot->new().
Read URLs to crawl from file. Lines starting with whitespace or # are ignored.
#
Spawn n workers to crawl in parallel. The default is to crawl serially. If n is less than the number of URLs, the list of URLs will be sliced and apportioned among the n workers according to --pool_size.
The max number of URLs per worker. Default is to divide the number of URLs by the number of workers, but you might want to set the size n to a lower number in order to minimize wait time between crawls.
Peter Karman, <karman at cpan.org>
<karman at cpan.org>
Please report any bugs or feature requests to bug-dezi-bot at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Dezi-Bot. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
bug-dezi-bot at rt.cpan.org
You can find documentation for this module with the perldoc command.
perldoc Dezi::Bot
You can also look for information at:
RT: CPAN's request tracker
http://rt.cpan.org/NoAuth/Bugs.html?Dist=Dezi-Bot
AnnoCPAN: Annotated CPAN documentation
http://annocpan.org/dist/Dezi-Bot
CPAN Ratings
http://cpanratings.perl.org/d/Dezi-Bot
Search CPAN
http://search.cpan.org/dist/Dezi-Bot/
Copyright 2013 Peter Karman.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.
To install Dezi::Bot, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Dezi::Bot
CPAN shell
perl -MCPAN -e shell install Dezi::Bot
For more information on module installation, please visit the detailed CPAN module installation guide.