HTML::RelExtor - Extract "rel" and "rev" information from LINK and A tags.
use HTML::RelExtor; my $parser = HTML::RelExtor->new(); $parser->parse($html); for my $link ($parser->links) { print $link->href, "\n" if $link->has_rel('nofollow'); } my($canonical) = grep $_->has_rev('canonical'), $parser->links; if ($canonical) { $shorten_url = $canonical->href; }
HTML::RelExtor is a HTML parser module to extract relationship information from A and LINK HTML tags.
A
$parser = HTML::RelExtor->new(); $parser = HTML::RelExtor->new(base => $base_uri);
Creates new HTML::RelExtor object.
$parser->parse($html);
Parses HTML content. See HTML::Parser for other method signatures.
my @links = $parser->links(); my @links = $parser->links(rel => 'alternate'); my @links = $parser->links(rev => 'canonical');
Returns list of link information with 'rel' or 'rev' attributes as a HTML::RelExtor::Link object. When given rel or rev parameter, returns only links that has the rel or rev value.
# These are equivalent @links = $parser->links(rel => 'alternate'); @links = grep $_->has_rel('alternate'), $parser->links;
my $href = $link->href;
Returns 'href' attribute of links.
my $tag = $link->tag;
Returns tag name of links in lowercase, either 'a' or 'link';
my $attr = $link->attr;
Returns a hash reference of attributes of the tag.
my @rel = $link->rel;
Returns list of 'rel' attributes. If a link contains <a href="tag nofollow">blahblah</a>, rel() method returns a list that contains tag and nofollow.
<a href="tag nofollow">blahblah</a>
rel()
tag
nofollow
my @rev = $link->rev;
Returns list of 'rev' attributes.
if ($link->has_rel('nofollow')) { }
A handy shortcut method to find out if a link contains specific relationship.
if ($link->has_rev('canonical')) { }
A handy shortcut method to find out if a link contains specific reverse relationship.
my $text = $link->text;
Returns text inside tags, only avaiable with A tags. It returns undef value when called with LINK tags.
Collect A links tagged with rel="friend" used in XFN (XHTML Friend Network).
rel="friend"
my $p = HTML::RelExtor->new(); $p->parse($html); my @links = map { $_->href } grep { $_->tag eq 'a' && $_->has_rel('friend') } $p->links;
Accept callback parameter when creating a new instance.
Tatsuhiko Miyagawa <miyagawa at bulknews.net>
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
HTML::LinkExtor, HTML::Parser
http://www.w3.org/TR/REC-html40/struct/links.html
http://www.google.com/googleblog/2005/01/preventing-comment-spam.html
http://developers.technorati.com/wiki/RelTag
http://gmpg.org/xfn/11
http://shiflett.org/blog/2009/apr/save-the-internet-with-rev-canonical
To install HTML::RelExtor, copy and paste the appropriate command in to your terminal.
cpanm
cpanm HTML::RelExtor
CPAN shell
perl -MCPAN -e shell install HTML::RelExtor
For more information on module installation, please visit the detailed CPAN module installation guide.