The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

MP3::PodcastFetch -- Fetch and manage a podcast subscription

SYNOPSIS

 use MP3::PodcastFetch;
 my $feed  = MP3::PodcastFetch->new(-base => '/tmp/podcasts',
                                    -rss  => 'http://www.npr.org/rss/podcast.php?id=500001'
                                    -rewrite_filename => 1,
                                    -upgrade_tag => 'auto');
 $feed->fetch_pods;
 print "fetched ",$feed->fetched," new podcasts\n";
 for my $file ($feed->fetched_files) {
    print $file,"\n";
 }

DESCRIPTION

This package provides a convenient and simple way of mirroring the podcasts described by an RSS feed into a local directory. It was written as the backend for the fetch_pods.pl script.

To use it, create an MP3::PodcastFetch object with the required -base and -rss arguments. The podcasts listed in the RSS subscription file located at the -rss URL will be mirrored into one or more subdirectories located beneath the path at -base. One subdirectory will be created for each channel specified by the RSS. Additional new() arguments control optional features of this module.

Once the object is created, call its fetch_pods() method to download the RSS file, parse it, and mirror the subscribed podcasts locally.

METHODS

This module implements the following methods:

Constructor

 $feed = MP3::PodcastFetch->new(-base=>$base,-rss=>$url, [other args])

The new() method creates a new MP3::PodcastFetch object. Options are as follows:

-base

The base directory for all mirrored podcast files, e.g. "/var/podcasts". Fetched podcasts files will be stored into appropriately-named subdirectories of this location, one subdirectory per channel. Additional subdirectory levels can be added using the -subdirs argument. This argument is required.

-override_channel_dir

Default is to use directory named after a channel title. Specify another directory instead.

-rss

The URL of the RSS feed to subscribe to. This is usually indicated in web pages as a red "podcast" or "xml" icon. This argument is required.

-verbose

If true, print status messages to STDERR for each podcast file attempted.

-env_proxy

If true, load proxy settings from *_proxy environment variables.

-max

Set the maximum number of podcast episodes to keep.

-keep_old

If true, keep old episodes and skip new ones if -max is exceeded. The default is to delete old episodes to make room for new ones.

-timeout

How long (in seconds) to wait before timing out slow servers. Applies to both the initial RSS feed fetching and mirroring individual podcast episodes.

-mirror_mode

One of "exists" or "modified-since". The default, "exists", will cause podcast episodes to be skipped if a like-named file already exists. "modified-since" performs a more careful comparison with the corresponding podcast episode on the remote server. The local file will be refreshed if the remote server's version is more recent.

-rewrite_filename

If true, cryptic MP3 names will be replaced with long names based on podcast episode title.

-upgrade_tag

Some podcast files have informative ID3 tags, but many don't. Particularly annoying is the genre, which may be given as "Speech", "Podcast", or anything else. The upgrade_tag option, if set to a non-false value, will attempt to normalize the ID3 tags from the information provided by the RSS feed information. Specifically, the title will be set to the title of the podcast, the album will be set to the title of the channel (e.g. "New York Times Front Page"), the artist will be set to the channel author (e.g. "The New York Times"), the year will be set to the publication date, the genre will be set to "Podcast" and the comment will be set to the channel description. You can change some of these values using the options "force_genre," "force_album," and "force_artist."

The value of upgrade_tag is one of:

 false     Don't mess with the ID3 tags
 id3v1     Upgrade the ID3 version 1 tag
 id3v2.3   Upgrade the ID3 version 2.3 tag
 id3v2.4   Upgrade the ID3 version 2.4 tag
 auto      Choose the best tag available

Depending on what optional Perl ID3 manipulation modules you have installed, you may be limited in what level of ID3 tag you can update:

 Audio::TagLib            all versions through 2.4
 MP3::Tag                 all versions through 2.3
 MP3::Info                only version 1.0

Choosing "auto" is your best bet. It will dynamically find what Perl modules you have installed, and choose the one that provides the most recent tag version. Omit this argument, or set it to false, to prevent any ID3 tag rewriting from occurring.

-force_genre, -force_artist, -force_album

If you have "upgrade_tag" set to a true value (and at least one tag-writing module installed) then each podcast's ID3 tag will be modified to create a consistent set of fields using information provided by the RSS feed. The title will be set to the title of the podcast, the album will be set to the title of the channel (e.g. "New York Times Front Page"), the artist will be set to the channel author (e.g. "The New York Times"), the year will be set to the publication date, the genre will be set to "Podcast" and the comment will be set to the channel description.

You can change some of these values using these three options:

 -force_genre     Change the genre to whatever you specify.
 -force_artist    Change the artist.
 -force_album     Change the album.

Note that if you use ID3v1 tagging (e.g. MP3::Info) then you must choose one of the predefined genres; in particular, there is no genre named "Podcast." You must force something else, like "Speech" instead.

-playlist_handle

A writeable filehandle on a previously-opened .m3u playlist file. The playlist file must already have the "#EXTM3U" top line written into it. The podcast fetch operation will write an appropriate item description for each podcast episode it mirrors.

-playlist_base

If you are writing a playlist and mirroring the podcasts to a removable medium such as an sdcard for later use with a portable music player device, you will need to set this argument to the directory path to each podcast file as it will appear to the music player. For example, if you mount the medium at /mnt/sdcard and keep podcasts in /mnt/sdcard/podcasts, then the -base and -playlist_base options might look like this:

  -base          => '/mnt/sdcard/podcasts',
  -playlist_base => '/podcasts'

For Windows-based devices, you might have to specify a playlist_base using Windows filesystem conventions.

-subdir

Ordinarily each podcast will be placed in a directory named after its channel, directly underneath the directory specified by "base." If this boolean is set to a partial path, then additional levels of directory will be placed between the base and the channel directory. For instance:

 -base    => '/tmp/podcasts',
 -subdir  => 'News/Daily',

Will place the channel's podcasts in '/tmp/podcasts/News/Daily/channel_name/'

-force_genre, -force_artist, -force_album

If -upgrade_tag is set to true, then you can use these options to force the genre, artist and/or album to desired hard-coded values. By default, genre will be set to "Podcast", and artist and album will be dynamically determined from information provided by the RSS feed, such that the channel name becomes the album and the podcast author becomes the artist.

-use_pub_date

If -use_pub_date is set to true, then podcast files will have their modification times set to match the publication time specified in the RSS feed. Otherwise they will take retain the modification time they carry on the site they are downloaded from.

-fetch_callback

If you provide a coderef to -fetch_callback this routine will be invoked on every file fetched immediately after the file is created. It will be called with two arguments corresponding to the MP3::PodcastFetch object, and the complete path to the fetched file:

   my $callback = sub {
       my ($feed,$filepath) = @_;
       print STDERR "$filepath successfully fetched\n";
   }

   $feed = MP3::PodcastFetch->new(-base           => $base,
                                  -rss            => $url,
                                  -fetch_callback => $callback);
-delete_callback

Similar to -fetch_callback except that the passed coderef is called on every deleted file immediately after the file is deleted.

Read/write accessors

The following are read/write accessors (get and/or set the corresponding option). Each takes the form:

 $old_value = $feed->accessor([$new_value])

Where $new_value is optional.

$feed->base
$feed->subdir
$feed->override_channel_dir
$feed->rss
$feed->timeout
$feed->mirror_mode
$feed->verbose
$feed->env_proxy
$feed->rewrite_filename
$feed->upgrade_tags
$feed->keep_old
$feed->playlist_handle
$feed->playlist_base
$feed->force_genre
$feed->force_artist
$feed->force_album

Common methods

The following methods are commonly used in end-user scripts:

$feed->fetch_pods

Mirror the subscribed podcast episodes into the base directory specified in new(). After calling it, use the fetched() and errors() methods to find out how many podcasts were successfully mirrored and whether there were any errors. Use the fetched_files() method to get the names of the newly fetched podcasts.

@files = $feed->fetched_files

This method will return the complete paths to each of the podcast episodes successfully fetched by the proceeding call to fetch_pods().

@files = $feed->deleted_files

This method will return the complete paths to each of the podcast episodes successfully deleted by the proceeding call to fetch_pods().

$feed->fetched

The number of episodes fetched/refreshed.

$feed->skipped

The number of episodes skipped.

$feed->deleted

The number of episodes deleted because they are either no longer mentioned in the subscription file or exceed the per-feed limit.

$feed->errors

The number of episodes not fetched because of an error.

Internal Methods

These methods are intended for internal use cut can be overridden in subclasses in order to change their behavior.

$feed->update($channel)

Update all episodes contained in the indicated MP3::PodcastFetch::Feed::Channel object (this object is generated by podcast_fetch() in the course of downloading and parsing the RSS file.

$feed->bump_fetched($value)
$feed->bump_error($value)
$feed->bump_deleted($value)
$feed->bump_skipped($value)

Increase the fetched, error, deleted and skipped counters by $value, or by 1 if not specified.

$feed->mirror($dir,$items,$channel)

Mirror a list of podcast episodes into the indicated directory. $dir is the absolute path to the directory to mirror the episodes into, $items is an array ref of MP3::PodcastFetch::Feed::Item objects, and $channel is a MP3::PodcastFetch::Feed::Channel object.

$feed->mirror_url($ua,$url,$filename,$item,$channel)

Fetch a single podcast episode. Arguments are:

 $ua        An LWP::UserAgent object
 $url       The URL of the podcast episode to mirror
 $filename  The local filename for the episode (may already exist)
 $item      The corresponding MP3::PodcastFetch::Feed::Item object
 $channel   The corresponding MP3::PodcastFetch::Feed::Channel object
$feed->log(@msg)

Log the strings provided in @msg to STDERR. Logging is controlled by the -verbose setting.

$feed->log_error(@msg)

Log the errors provided in @msg to STDERR. Logging occurs even if -verbose is false.

$feed->add_file($path)

Record that we successfully mirrored the podcast episode indicated by $path.

$feed->write_playlist($filename,$item,$channel)

Write an entry into the current playlist indicating that $filename is ready to be listened to. $item and $channel are the MP3::PodcastFetch::Feed::Item and Channel objects respectively.

$feed->fix_tags($filename,$item,$channel)

Fix the ID3 tags in the newly-downloaded podcast episode indicated by $filename. $item and $channel are the MP3::PodcastFetch::Feed::Item and Channel objects respectively.

$duration = $feed->get_duration($filename,$item)

This method is used to provide extended information for .m3u playlists.

Get the duration, in seconds, of the podcast episode given by $filename. If an ID3 tagging library is available, the duration will be calculated from the MP3 file directory. Otherwise, it will fall back to using the duration specified by the RSS feed's MP3::PodcastFetch::Feed::Item object. Many RSS feeds do not specify the duration, in which case get_duration() will return 0.

$filename = $feed->make_filename($url,$title)

Create a filename for the episode located at $url based on its $title or the last component of the URL, depending on -rewrite_filename argument provided to new().

$path = $feed->generate_directory($channel)

Create a directory for the channel specified by the provided MP3::PodcastFetch::Feed::Channel object, respecting the values of -base and -subdir. The path is created in an OS-independent way, using File::Spec->catfile(). The directory will be created if it doesn't already exist. If it already exists and is not writeable, the method errors out.

$dirname = $feed->channel_dir($channel)

Generate a directory named based on the provided channel object's title, unless it is overriden by -override_channel_dir value.

$safe_str = $feed->safe_str($unsafe_str)

This method generates OS-safe path components from channel and podcast titles. It replaces whitespace and other odd characters with underscores.

SEE ALSO

podcast_fetch.pl, MP3::PodcastFetch::Feed, MP3::PodcastFetch::Feed::Channel, MP3::PodcastFetch::Feed::Item, MP3::PodcastFetch::TagManger, MP3::PodcastFetch::XML::SimpleParser

AUTHOR

Lincoln Stein <lstein@cshl.org>.

Copyright (c) 2006 Lincoln Stein

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See DISCLAIMER.txt for disclaimers of warranty.