WWW::RobotRules::Parser - Just Parse robots.txt
use WWW::RobotRules::Parser; my $p = WWW::RobotRules::Parser->new; $p->parse($robots_txt_uri, $text); $p->parse_uri($robots_txt_uri);
WWW::RobotRules::Parser allows you to simply parse robots.txt files as described in http://www.robotstxt.org/wc/norobots.html. Unlike WWW::RobotRules (which is very cool), this module does not take into consideration your user agent name when parsing. It just parses the structure and returns a hash containing the whole set of rules. You can then use this to do whatever you like with it.
I mainly wrote this to store away the parsed data structure else where for later use, without having to specify an user agent.
Creates a new instance of WWW::RobotRules::Parser
Given the URI of the robots.txt file and its contents, parses the content and returns a data structure that looks like the following:
{ '*' => [ '/private', '/also_private' ], 'Another UserAgent' => [ '/dont_look' ] }
Where the key is the user agent name, and the value is an arrayref of all paths that are prohibited by that user agent
Given the URI of the robots.txt file, retrieves and parses the file.
WWW::RobotRules
Copyright (c) 2006-2007 Daisuke Maki <daisuke@endeworks.jp>
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
See http://www.perl.com/perl/misc/Artistic.html
To install WWW::RobotRules::Parser, copy and paste the appropriate command in to your terminal.
cpanm
cpanm WWW::RobotRules::Parser
CPAN shell
perl -MCPAN -e shell install WWW::RobotRules::Parser
For more information on module installation, please visit the detailed CPAN module installation guide.