HTML::Extract - Perl extension for getting text and HTML snippets out of HTML pages in general.
use HTML::Extract; my $extractor=new HTML::Extract; # return a text version of the content print $extractor->gethtml(http://uri/,tagname=body,returntype=text);
This is a pretty simple little Perl module for getting text out of HTML pages. It's really designed so that you can call it in anything where you would otherwise be looking for a way of stripping part of web pages away (for example, if you are extracting some pieces of text with the intent of placing it elsewhere). It also comes with a little demonstration program that shows how it can be wrapped as a command line program...
None.
Obviously this makes use of quite a few other modules to do what it does; HTML::Element, HTML::TreeBuilder, HTML::TagFilter, LWP::UserAgent, LWP::Simple.
Emma Tonkin, < cselt@users.sourceforge.net >
Copyright (C) 2006 by Emma Tonkin
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.6 or, at your option, any later version of Perl 5 you may have available.
To install HTML::Extract, copy and paste the appropriate command in to your terminal.
cpanm
cpanm HTML::Extract
CPAN shell
perl -MCPAN -e shell install HTML::Extract
For more information on module installation, please visit the detailed CPAN module installation guide.