HTML::Grabber
use HTML::Grabber; use LWP::Simple; my $dom = HTML::Grabber->new( html => get('http://twitter.com/ned0r') ); $dom->find('.tweet-content')->each(sub { my $body = $_->find('.tweet-text')->text; my $when = $_->find('.js-tweet-timestamp')->attr('data-time'); my $link = $_->find('.js-permalink')->attr('href'); print "$body $when (link: $link)\n"; });
HTML::Grabber provides a jQuery style interface to HTML documents. This makes parsing and manipulating HTML documents trivially simple for those people familiar with http://jquery.com.
It uses XML::LibXML for DOM parsing/manipulation and HTML::Selector::XPath for converting CSS expressions into XPath.
Martyn Smith <martyn@dollyfish.net.nz>
All selectors are CSS. They are internally converted to XPath using HTML::Selector::XPath. If some creative selector you're trying isn't working as expected, it may well be worth checking out the documentation for that module to see if it's supported.
Get descendants of each element in the current set of matched elements, filtered by a selector.
Get the immediately preceding sibling of each element in the set of matched elements, optionally filtered by a selector.
Reduce the set of matched elements to those that match the selector
Filter the current set of matched elements to those that contain the text specified by $match. If you prefer, $match can also be a Regexp
Get the parent of each element in the current set of matched elements
Get the combined text contents of each element in the set of matched elements, including their descendants.
Return text for each element as a list
Return the HTML of the currently matched elements
Return the HTML each element as a list
Removes the matched nodes from the DOM tree returning them
Get the value of an attribute for the first element in the set of matched elements.
Execute a sub for each matched node
Execute a sub for each matched node returning a list containing the result of each sub
Internal method for taking a list of XML::LibXML::Elements and returning a unique list
To install HTML::Grabber, copy and paste the appropriate command in to your terminal.
cpanm
cpanm HTML::Grabber
CPAN shell
perl -MCPAN -e shell install HTML::Grabber
For more information on module installation, please visit the detailed CPAN module installation guide.