WWW::Pastebin::Base::Retrieve - base class for modules which implement retrieving of pastes from pastebins
package WWW::Pastebin::PhpfiCom::Retrieve; use base 'WWW::Pastebin::Base::Retrieve'; use HTML::TokeParser::Simple; use HTML::Entities; sub _make_uri_and_id { # here we get whatever user passed to retrieve() # and we need to return the ID of the paste and URI pointing to it my ( $self, $id ) = @_; $id =~ s{ ^\s+ | (?:http://)? (?:www\.)? phpfi\.com/(?=\d+) | \s+$ }{}xi; return $self->_set_error( q|Doesn't look like a correct ID or URI to the paste| ) if $id =~ /\D/; return ( URI->new("http://www.phpfi.com/$id"), $id ); } sub _get_was_successful { # this sub actually defaults to $self->_parse( $content ); # which is fine for most pastebins... my ( $self, $content ) = @_; my $results_ref = $self->_parse( $content ); return unless defined $results_ref; my $content_uri = $self->uri->clone; $content_uri->query_form( download => 1 ); my $content_response = $self->ua->get( $content_uri ); if ( $content_response->is_success ) { $results_ref->{content} = $self->content($content_response->content); return $self->results( $results_ref ); } else { return $self->_set_error( 'Network error: ' . $content_response->status_line ); } } sub _parse { # this is the "core", this sub would parse out the content of # the paste and return data my ( $self, $content ) = @_; my $parser = HTML::TokeParser::Simple->new( \$content ); my %data; my %nav = ( content => '', map { $_ => 0 } qw(get_info level get_lang is_success get_content check_404) ); while ( my $t = $parser->get_token ) { if ( $t->is_start_tag('td') ) { $nav{get_info}++; $nav{check_404}++; $nav{level} = 1; } # blah blah, blah do some parsin' # if you want to see full example see 'examples' directory # of this distribution elsif ( $nav{get_lang} == 1 and $t->is_start_tag('option') and defined $t->get_attr('selected') and defined $t->get_attr('value') ) { $data{lang} = $t->get_attr('value'); $nav{is_success} = 1; last; } } return $self->_set_error('This paste does not seem to exist') if $nav{content} =~ /entry \d+ not found/i; return $self->_set_error("Parser error! Level == $nav{level}") unless $nav{is_success}; $data{ $_ } = decode_entities( delete $data{ $_ } ) for grep { $_ ne 'content' } keys %data; return \%data; } package main; my $paster = WWW::Pastebin::PhpfiCom::Retrieve->new; $paster->retrieve('http://phpfi.com/302683') or die $paster->error; print "Paste content is:\n$paster\n";
This module is a base class for modules which provide interface to fetch pastes on various pastebin sites. How useful this module may be to you depends entirely on the pastebin site you want to interface is. The synopsis shows a version of WWW::Pastebin::PhpfiCom::Retrieve module (with parser trimmed down) which requires a bit more than usual pastebin sites.
new retrieve error content results ua uri id
Private methods:
_make_uri_and_id _parse _get_was_successful _set_error
Also the content() method is overloaded for interpolation. Thus users of your module can interpolate the object in string to obtain contents of the retrieved paste.
content()
In general, the smallest module would provide the _make_uri_and_id() and _parse() methods. The _parse method would set the content() data accessor or set the error() by using return $self->_set_error('Some error')
_make_uri_and_id()
_parse()
_parse
error()
return $self->_set_error('Some error')
Functionality of private methods is described below. Functionality of public methods is described in the "DOCUMENTATION FOR YOUR MODULE" section.
_make_uri_and_id
sub _make_uri_and_id { # here we get whatever user passed to retrieve() # and we need to return the ID of the paste and URI pointing to it my ( $self, $id ) = @_; $id =~ s{ ^\s+ | (?:http://)? (?:www\.)? phpfi\.com/(?=\d+) | \s+$ }{}xi; return $self->_set_error( q|Doesn't look like a correct ID or URI to the paste| ) if $id =~ /\D/; return ( URI->new("http://www.phpfi.com/$id"), $id ); }
The _make_uri_and_id() method will be called internally by the object when the user calls the parse() method. The @_ will contain the same elements which user provided with his/her call to retrieve() method. Note: the base class will check the first argument to defined()ness and length() before calling _make_uri_and_id() method.
parse()
@_
retrieve()
defined()
length()
This method must return a list of two elements, first element must be a URI object pointing to the page containing the paste and the second element must be the ID of the paste. These will be assigned to uri() and id() public methods.
uri()
id()
_get_was_successful
sub _get_was_successful { # this sub actually defaults to $self->_parse( $content ); # which is fine for most pastebins... my ( $self, $content ) = @_; my $results_ref = $self->_parse( $content ); return unless defined $results_ref; my $content_uri = $self->uri->clone; $content_uri->query_form( download => 1 ); my $content_response = $self->ua->get( $content_uri ); if ( $content_response->is_success ) { $results_ref->{content} = $self->content($content_response->content); return $self->results( $results_ref ); } else { return $self->_set_error( 'Network error: ' . $content_response->status_line ); } }
With many pastebins you won't even have to touch the _get_was_successful() method. It defaults to:
_get_was_successful()
sub _get_was_successful { my ( $self, $content ) = @_; return $self->results( $self->_parse( $content ) ); }
And is called inside retrieve() method when the LWP::UserAgent object successfuly retrieved the page of the pastebin. This method is provided in case you'll need to make more requests as was the case with http://phpfi.com/ pastebin shown in the "SYNOPSIS".
# See "SYNOPSYS" or script in 'examples' directory for an example
The _parse method is what will be called upon successful retrieval of the page with the paste. Here you would normally parse out anything you need, set the content() accessor/mutator (see "DOCUMENTATION FOR YOUR MODULE" section) and return a reference to the data you've parsed out, the return value will be available to the user via results() method.
results()
_set_error
do_stuff() or return $self->_set_error('blah');
The _set_error() method is not something you'd normally would override as it is just a handy method to set the error to whatever is passed in the argument and do a return;. When second argument is passed the first argument will be treated as a HTTP::Response object and the error will be constructed as 'Network error: ' . $first_arg->status_line The default _set_error method looks like this:
_set_error()
return;
'Network error: ' . $first_arg->status_line
sub _set_error { my ( $self, $error_or_response_obj, $is_net_error ) = @_; if ( defined $is_net_error ) { $self->error( 'Network error: ' . $error_or_response_obj->status_line ); } else { $self->error( $error_or_response_obj ); } return; }
This section describes the functionality of public methods and is presented in a copy/paste friendly format so you could save yourself some time writing up docs for your module. The word "EXAMPLE" is used in places you need to edit, but make sure to proof-read the whole thing anyway.
=head1 NAME WWW::Pastebin::EXAMPLE::Retrieve - a module to retrieve pastes from EXAMPLE website =head1 SYNOPSIS my $paster = WWW::Pastebin::EXAMPLE::Retrieve->new; $paster->retrieve('http://EXAMPLE') or die $paster->error; print "Paste content is:\n$paster\n"; =head1 DESCRIPTION The module provides interface to retrieve pastes from EXAMPLE website via Perl. =head1 CONSTRUCTOR =head2 C<new> my $paster = WWW::Pastebin::EXAMPLE::Retrieve->new; my $paster = WWW::Pastebin::EXAMPLE::Retrieve->new( timeout => 10, ); my $paster = WWW::Pastebin::EXAMPLE::Retrieve->new( ua => LWP::UserAgent->new( timeout => 10, agent => 'PasterUA', ), ); Constructs and returns a brand new juicy WWW::Pastebin::EXAMPLE::Retrieve object. Takes two arguments, both are I<optional>. Possible arguments are as follows: =head3 C<timeout> ->new( timeout => 10 ); B<Optional>. Specifies the C<timeout> argument of L<LWP::UserAgent>'s constructor, which is used for retrieving. B<Defaults to:> C<30> seconds. =head3 C<ua> ->new( ua => LWP::UserAgent->new( agent => 'Foos!' ) ); B<Optional>. If the C<timeout> argument is not enough for your needs of mutilating the L<LWP::UserAgent> object used for retrieving, feel free to specify the C<ua> argument which takes an L<LWP::UserAgent> object as a value. B<Note:> the C<timeout> argument to the constructor will not do anything if you specify the C<ua> argument as well. B<Defaults to:> plain boring default L<LWP::UserAgent> object with C<timeout> argument set to whatever C<WWW::Pastebin::EXAMPLE::Retrieve>'s C<timeout> argument is set to as well as C<agent> argument is set to mimic Firefox. =head1 METHODS =head2 C<retrieve> my $results_ref = $paster->retrieve('http://EXAMPLE/301425') or die $paster->error; my $results_ref = $paster->retrieve('EXAMPLE301425') or die $paster->error; Instructs the object to retrieve a paste specified in the argument. Takes one mandatory argument which can be either a full URI to the paste you want to retrieve or just its ID. On failure returns either C<undef> or an empty list depending on the context and the reason for the error will be available via C<error()> method. On success returns a hashref with the following keys/values: EXAMPLE EXAMPLE EXAMPLE =head2 C<error> $paster->retrieve('EXAMPLE') or die $paster->error; On failure C<retrieve()> returns either C<undef> or an empty list depending on the context and the reason for the error will be available via C<error()> method. Takes no arguments, returns an error message explaining the failure. =head2 C<id> my $paste_id = $paster->id; Must be called after a successful call to C<retrieve()>. Takes no arguments, returns a paste ID number of the last retrieved paste irrelevant of whether an ID or a URI was given to C<retrieve()> =head2 C<uri> my $paste_uri = $paster->uri; Must be called after a successful call to C<retrieve()>. Takes no arguments, returns a L<URI> object with the URI pointing to the last retrieved paste irrelevant of whether an ID or a URI was given to C<retrieve()> =head2 C<results> my $last_results_ref = $paster->results; Must be called after a successful call to C<retrieve()>. Takes no arguments, returns the exact same hashref the last call to C<retrieve()> returned. See C<retrieve()> method for more information. =head2 C<content> my $paste_content = $paster->content; print "Paste content is:\n$paster\n"; Must be called after a successful call to C<retrieve()>. Takes no arguments, returns the actual content of the paste. B<Note:> this method is overloaded for this module for interpolation. Thus you can simply interpolate the object in a string to get the contents of the paste. =head2 C<ua> my $old_LWP_UA_obj = $paster->ua; $paster->ua( LWP::UserAgent->new( timeout => 10, agent => 'foos' ); Returns a currently used L<LWP::UserAgent> object used for retrieving pastes. Takes one optional argument which must be an L<LWP::UserAgent> object, and the object you specify will be used in any subsequent calls to C<retrieve()>. =head1 SEE ALSO L<LWP::UserAgent>, L<URI>
WWW::Pastebin::Base::Create, LWP::UserAgent, URI
Zoffix Znet, <zoffix at cpan.org> (http://zoffix.com, http://haslayout.net)
<zoffix at cpan.org>
Please report any bugs or feature requests to bug-www-pastebin-base-retrieve at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=WWW-Pastebin-Base-Retrieve. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
bug-www-pastebin-base-retrieve at rt.cpan.org
You can find documentation for this module with the perldoc command.
perldoc WWW::Pastebin::Base::Retrieve
You can also look for information at:
RT: CPAN's request tracker
http://rt.cpan.org/NoAuth/Bugs.html?Dist=WWW-Pastebin-Base-Retrieve
AnnoCPAN: Annotated CPAN documentation
http://annocpan.org/dist/WWW-Pastebin-Base-Retrieve
CPAN Ratings
http://cpanratings.perl.org/d/WWW-Pastebin-Base-Retrieve
Search CPAN
http://search.cpan.org/dist/WWW-Pastebin-Base-Retrieve
Copyright 2008 Zoffix Znet, all rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install WWW::Pastebin::Base::Retrieve, copy and paste the appropriate command in to your terminal.
cpanm
cpanm WWW::Pastebin::Base::Retrieve
CPAN shell
perl -MCPAN -e shell install WWW::Pastebin::Base::Retrieve
For more information on module installation, please visit the detailed CPAN module installation guide.