The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

HTML::RSSAutodiscovery - methods for retreiving RSS-ish information from an HTML document.

SYNOPSIS

 use HTML::RSSAutodiscovery;
 use Data::Dumper;

 my $url = "http://www.diveintomark.org/";

 my $html = HTML::RSSAutodiscovery->new();
 print &Dumper($html->parse($url));

 # Mark's gone a bit nuts with this and
 # the list is too long to include here...

 # see the POD for the 'parse' method for
 # details of what it returns.

DESCRIPTION

Methods for retreiving RSS-ish information from an HTML document.

PACKAGE METHODS

__PACKAGE__->new()

Object constructor. Returns an object. Woot!

OBJECT METHODS

$obj->parse($arg)

Parse an HTML document and return RSS-ish <link> information.

$arg may be either:

  • An HTML string, passed as a scalar reference.

  • A URI.

Returns an array reference of hash references whose keys are :

  • title

  • type

  • rel

  • href

$obj->locate($uri,\%args)

Like the parse method, but will perform additional lookups, if necessary or specified.

Valid arguments are

  • uri

    String. A live, breathing URI to slurp and parse.

    Required

  • Hash ref whose keys may be

    • noparse

      Boolean. Don't bother parsing the document, this will also prevent you from checking for embedded links.

      I don't know why you want to do this, but you can.

      False, by default.

    • embedded

      Boolean. Check all embedded links ending in '.xml', '.rss' or '.rdf' (and then 'xml', 'rss' or 'rdf') for RSS-ness.

      False, by default, unless the initial parsing of the URI returns no RSS links.

    • embedded_and_remote

      Boolean.

      Boolean. Check all embedded links whose root is not the same as $uri for RSS-ness.

      False, by default.

    • syndic8

      Boolean. Check the syndic8 servers for sites matching $uri

      False, by default, unless the initial parsing of the URI and any embedded links returns no RSS links.

Returns an array reference of hash references whose keys are :

  • title

  • type

  • rel

  • href

VERSION

1.21

DATE

$Date: 2004/10/17 04:13:06 $

AUTHOR

Aaron Straup Cope

SEE ALSO

Because you shouldn't need all that white space to do cool stuff ;-)

http://diveintomark.org/archives/2002/05/30.html#rss_autodiscovery

http://diveintomark.org/archives/2002/08/15.html

http://diveintomark.org/projects/misc/rssfinder.py.txt

REQUIREMENTS

BASIC

These packages are required to actually parse an HTML document or URI.

  • HTML::Parser

  • LWP::UserAgent

  • HTTP::Request

EMBEDDED

These packages are required to check the embedded links in a URI for RSS files. They are not loaded until run-time so they are not required for doing basic parsing

  • XML::RSS

SYNDIC8

These packages are required to query the syndic8 servers for RSS files associated with a URI. They are not loaded until run-time so they are not required for doing basic parsing

  • XMLRPC::Lite

LICENSE

Copyright (c) 2002-2004, Aaron Straup Cope. All Rights Reserved.

This is free software, you may use it and distribute it under the same terms as Perl itself.