The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Sub::Slice - split long-running tasks into manageable chunks

SYNOPSIS

        # Client
        # Assume methods in the Server:: package are magically remoted
        my $token = Server::create_token();
        for(1 .. MAX_ITERATIONS) {
                Server::do_work($token);
                last if $token->{done};
        }

        # Server
        # Imagine this is on a remote machine
        package Server;
        use Sub::Slice;

        sub create_token {
                # create a new job:
                my $job = new Sub::Slice(
                        backend         => 'Filesystem',
                        storage_options => {
                                path  => '/var/tmp/myproject/',
                        }
                );
                return $job->token;
        }

        sub do_work {
                # loading an existing job:
                my $job = new Sub::Slice(
                        token           => $token
                        backend         => 'Filesystem',
                        storage_options => {
                                path  => '/var/tmp/myproject/',
                        }
                );

                at_start $job
                        sub {
                                $job->store('foo', '1');
                                $job->store('bar', { abc = > 'def' });
                                # store data, initialise
                                $job->set_estimate(10); # estimate number of steps
                                return ( $job->fetch('foo') );
                        };

                my $foo = $job->fetch('foo');

                at_stage $job "stage_one",
                        sub {
                                my $bar = $job->fetch('bar');
                                # do stuff
                                $job->next_stage('stage_two') if $some_condition;
                        };

                at_stage $job "stage_two",
                        sub {
                                # ...do more stuff...
                                # mark job as ready to be deleted
                                $job->done() if $job->count() == $job->estimate();
                        };

                return $job->return_value(); #Pass back any return value from coderefs
        }

DESCRIPTION

Sub::Slice breaks up a long process into smaller chunks that can be executed one at a time over a stateless protocol such as HTTP/SOAP so that progress may be reported. This means that the client can display progress or cancel the operation part-way through.

It works by the client requesting a token from the server, and passing the token back to the server on each iteration. The token passed to the client contains status information which the client can use to determine if the job has completed/failed and to display status/error messages.

Within the routine called on each iteration, the server defines a set of coderefs, one of which will be called for a given iteration. In addition the server may define coderefs to be called at the start and end of the job. The server may provide the client with an estimate of the number of iterations the job is likely to take.

It is possible to balance performance/usability by modifying the number of iterations that will be executed before returning progress to the client.

METHODS

new( %options )

Create a new job object. Valid options are:

token

A token for an existing job (optional)

iterations

The number of chunks to execute before saving the state and returning. Defaults to '1'. This value may be overridden later on by setting the value in the token. Set to 0 for unlimited.

backend

The storage backend. This should either be a fully qualified package name or if no namespace is included it's assumed to be in the Sub::Slice::Backend namespace (e.g. Database would be interpreted as Sub::Slice::Backend::Database). Defaults to Sub::Slice::Backend::Filesystem.

pin_length

The size of the random PIN used to sign the token. Default is 1e9.

random_pin ($l)

Generates a random PIN of length $l. We do this using rand(). You might want to override this method if you require cryptographic-quality randomness for your environment.

auto_blob_threshold

If this is set, any strings longer than this number of bytes will be stored as BLOBs automatically (possibly taking advantage of a more efficient BLOB storage mechanism offered by the backend). Note that this does not apply when you store references, only to strings of characters/bytes.

storage_options

A hash of configuration options for the backend storage. See the POD of the backend module (default is Sub::Slice::Backend::Filesystem).

Returns an existing job object with session data for $token

METHODS DEFINING STAGES OF ITERATION

at_start $job \&coderef

Code to initialise the job. This isn't counted as an iteration and will only run once per job.

at_stage $job $stage_name, \&coderef

Executes \&coderef up-to iterate times, if $stage_name is the current stage and if the number of executions in the current session is not greater than iterate. It is currently required that you have at least one at_stage defined.

If the current stage hasn't been set with next_stage(), it will implicitly be set to the first at_stage block that is seen.

at_end $job \&coderef

Code to run after the last iteration (unless the job is aborted before then). This isn't counted as an iteration and will only run once per job. It's typically used as a "commit" stage.

If a job dies in one of these blocks, Sub::Slice sets $job->abort($@) and rethrows the exception. Note that at_end may not be run if a job is aborted during one of the earlier stages. See Sub::Slice::Manual for an example of defensive coding to prevent resources allocated in at_start leaking if the job is aborted part-way through.

ACCESSOR METHODS

$job->token()

Returns the token object for this job. The token object will be updated automatically as stages of the sub execute. The token has the following properties which the client can make use of:

done

Read/write boolean value. Is the job done? Setting this to 1 on the client will cause iterations on the server to cease, and any at_end cleanup to be done.

abort

Read-only boolean value. Was the job aborted on the server?

error

Read-only. Error message if the job was aborted.

count

Read-only. Number of iterations performed so far.

estimate

Read-only. An estimate of the total number of iterations that will be performed. This may not be totally accurate, depending if new work is "discovered" as the iterations proceed.

status

Read-only. Status message.

stage

Read-only. The next stage that the job will run.

iterations

A write-only property the client can use to control the number of iterations run on the server in the next call. This overrides the default number of iterations set in the Sub::Slice constructor.

$job->id()

Returns the ID of the job (issued by the new_id function in the backend). This is mainly of interest if you are writing a backend and need to get the ID from a job.

$job->count()

Returns the total number of iterations that have been executed.

$job->estimate()

Returns an estimate of how many iterations are required for the job.

$job->is_done()

Returns a boolean value. Is the job done?

$job->stage()

Returns the name of the executing code block, as set by next_stage()

$job->fetch( $key )

Returns the user data stored under $key. If no data is found against $key, it automatically tries fetch_blob to see if data was stored as a blob.

$job->fetch_blob($key)

Returns a lump of data stored using store_blob - see the MUTATOR METHODS.

$job->return_value()

return_value() returns the return value of the stage. This return_value() method will help you avoid mistakes like this:

        sub do_work {
                my $job = new Sub::Slice(token => shift());     
                at_stage $job 'mystage', sub {
                        #  do stuff
                        return 'abc' #only returns 1 level up
                };
                #nowt returned from do_work
        }

The caller of do_work() will not receive the return value inside the 'mystage' sub {} This might be better written as :

        sub do_work {
                my $job = new Sub::Slice(token => shift());
                at_stage $job 'mystage', sub {
                        #  do stuff
                        return 'abc' #only returns 1 level up
                };
                return $job->return_value(); # 'abc'
        }

MUTATOR METHODS THAT SET VALUES IN THE TOKEN

$job->set_estimate( $int )

Populates the estimate field in the token with an estimate of how many iterations are required for this job to complete.

$job->done()

Mark the job as completed successfully. This sets the done flag in the token. Serialised object data will be removed when the object is destroyed.

$job->abort( $reason )

Mark the job as aborted. This sets the abort flag in the token. The optional $reason message will be stored in the token's error string. Serialised object data will be removed when the object is destroyed.

$job->status( $status_text )

Set the status field in the token. This might be useful to inform users about what is about to happen in the next iteration of the job.

OTHER MUTATOR METHODS

$job->next_stage( $stage_name )

Tell the $job object that the next time the routine is called, it should execute the block named $stage_name. Unless next_stage is set, the first at_stage block will be executed.

$job->store( $key => $value, $key2 => $value2, ... )

Store some user data in the object. $value can be a scalar containing any perl data type (such as hash/array references) - it will be automatically serialised.

Note that some objects may not be suited to serialisation. For example if an object is blessed into a package that is required at runtime, when it is deserialised, the required package may not actually be loaded.

There may also be issues serialising some objects like DBI database handles and XML::Parser objects, although this is potentially backend-specific (Filesystem uses Storable, and some objects may provide serialisation hooks).

$value is optional (if not specified, $value will be set to undef).

$job->store_blob($key => $blob)

Allows large lumps of data to be stored efficiently by the back end.

VERSION

        $Revision: 1.48 $ on $Date: 2005/11/23 14:31:51 $ by $Author: colinr $

AUTHOR

Simon Flack and John Alden with additions by Tim Sweetman <cpan _at_ bbc _dot_ co _dot_ uk>

COPYRIGHT

(c) BBC 2005. This program is free software; you can redistribute it and/or modify it under the GNU GPL.

See the file COPYING in this distribution, or http://www.gnu.org/licenses/gpl.txt