The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

WWW::ImageSpool - Cache images of interest from the web.

SYNOPSIS

        use WWW::ImageSpool;

        mkdir("/var/tmp/imagespool", 0700);
        
        my $spool = WWW::ImageSpool->new
        (
         limit => 3,
         searchlimit => 10,
         max => 5 * 1048576,
         dictionary => "sushi.txt",
         verbose => 1,
         dir => "/var/tmp/imagespool"
        );

        $spool->run();
         while($spool->uptime < 86400);
        

DESCRIPTION

When A WWW::ImageSpool object's run() method is called, it randomly picks keywords out of a chosen dictionary file and attempts to download images off of the internet by doing searches on these keywords. (Currently only a Google Image Search is done, via Guillaume Rousse's WWW::Google::Images module, but the internals have been set up to make it easy to hook into other engines in the future.) Images are stored in the specified directory. If the directory grows beyond the maximum size, the oldest files in the directory are deleted.

The intended purpose behind this module is to supply images on demand for any piece of software that wants abstract images, such as screensavers or webpage generators or voice synthesizers (wouldn't it be cool if a voice synthesizer extracted all the popular nouns out of a book and scrolled by pertanent images as it read to you?)

Constructor

new(%args)

Creates and returns a new WWW::ImageSpool object.

Required parameters:

dir => $dir

Directory to hold the image files in. WWW::ImageSpool will delete files out of this directory when it reaches the maximum size, so there shouldn't be anything in there that you want to keep.

Optional parameters:

limit => $limit

Maximum number of images to fetch from any one keyword search. Defaults to 3.

searchlimit => $searchlimit

Maximum number of search results to ask the search engine for. limit results will be randomly picked out of the list that the search engine returns. Default is search-engine specific (50 for Google). Most search engines will return the results in the same order each time they are called with the same keywords, so if you are using a small dictionary file it is generally a good idea to make this a lot higher than limit.

consume => 0 | 1

WWW::ImageSpool re-loads the dictionary file whenever it is modified, or whenever it runs out of words. With consume set to 0, WWW::ImageSpool will never run out of words because it can re-use them as much as they want. With consume set to 1, WWW::ImageSpool deletes each word from it's internal list as it uses it, ensuring that every single word in the dictionary must be used once before any word may be used twice.

consume is set to 1 by default.

retry => $retry

How many times to retry image-searching or fetching operations if they fail.

The actual maximum number of retries is ($retry * $retry); WWW::ImageSpool will try up to $retry times to find a word with good search results, then with that word, will try up to $retry times to get images from it, stopping after at least one image is successfully downloaded (or the retry is exhausted.)

retry is set to 5 by default.

minx=> $minx, miny => $miny

Minimum X / Y resolution of images to return. Smaller images are discarded.

By default, minx is set to 160, and miny is set to 120.

max => $bytes

Maximum size of the spool directory, in bytes. If the total size of all files in that directory ever goes over this size, the oldest file in the directory is deleted to make more room.

dictionary => $file

Path to the dictionary file to use. Defaults to "/usr/share/dict/words".

verbose => 0 - 4

Level of verbosity. Defaults to 0, which prints nothing. 1 prints a logfile-like status line for each iteration of run(). 2 prints each word that is picked, and advises if WWW::ImageSpool picked a file that already exists in the spool. 3-4 print more verbose debugging information.

Paramaters for making WWW::ImageSpool re-entrant:

These parameters are only really useful if you are creating and destroying WWW::ImageSpool objects throughout the lifespan of an application, but want your statistics to remain constant throughout:

n => $n

How many iterations of run() the application has done so far.

s => $s

UNIX timestamp of when the application did it's first call to run() on a WWW::ImageSpool object.

l => $l

UNIX timestamp of when the application last did a call to run() on a WWW::ImageSpool object.

got => $got

How many images have been downloaded and stored over the life of the application (including ones that have been deleted).

Methods

run()

Pick a new keyword and attemt to download up to limit images from an image search.

Returns the actual number of images downloaded and stored.

s()

Returns the UNIX timestamp of the object's first operation.

l()

Returns the UNIX timestamp of the object's last operation.

n()

Returns how many times run() has been called on this object.

uptime()

Returns the number of seconds between the object's first operation and it's last operation.

lag()

Returns the number of seconds between the object's last operation and the current time.

got()

Returns the total number of images that have been downloaded and stored by this object, including images that have been deleted.

BUGS

If the dictionary file suddenly disappears, WWW::ImageSpool does not act very graceful.

TODO

There should be size limitations on individual files with a HEAD check before they are actually downloaded.

Underlying modules (WWW::ImageSpool::Source::Google, WWW::ImageSpool::Dictionary, etc need to be documented.

Support for multiple "Source" and "Dictionary" objects in one "ImageSpool" object.

Per-run() control over the search configuration.

NOTE

This module may violate the terms of service of some search engines or content providers. Use at your own risk.

VERSION

0.01

LICENSE

Copyright 2004, Tyler "Crackerjack" MacDonald <tyler@yi.org> This is free software; you may redistribute it under the same terms as perl itself.