The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Parallel::Supervisor - Manage a collection of child (worker) processes

SYNOPSIS

    use Parallel::Supervisor;
    
    my $supervisor = Parallel::Supervisor->new;
    
    $supervisor->prepare("Child-1", "pwd");
    $supervisor->prepare("Child-2", "ls -lh");
    
    LAUNCH: while (my %child = %{$supervisor->get_next_ready} ) {
        $childpid = fork();
        if ($childpid == 0) { # child process
            close $child{parent_reader};
            select $child{child_writer};
            # do the work
            system($child{cmd});
            exit ; # don't let children play in the LAUNCH loop
        } else { # parent process
            next LAUNCH unless $childpid;
            $supervisor->attach($child{id},$childpid);
            close $child{child_writer};
            open STDIN, "<", $child{parent_reader};
        }
    }
    
    CLEANUP: foreach ( @{$supervisor->get_pids} ) {
        $supervisor->detach($_);
    }

DESCRIPTION

This module provides a simple way to manage a collection of jobs and monitor their output. The Supervisor will track whether each job has been launched and provide a pipe to allow the parent to read from the child. Each record has a name (id) which must be unique, and an associated command, in addition to a pair of non-blocking IO handles forming the pipe.

It is up to the caller to attach the pipe ends, run the command, and ensure it completes (see SEE ALSO below for modules to assist with this aspect). Once a job is launched, its pid is passed to the supervisor via the attach method, marking the job as active. When the job is completed, the calling code should detach the job (i.e. remove its pid and move it to the "finished" state).

The command associated with the job is a scalar, and it is up to the caller to determine how it is invoked. The example above uses fork() and system(), but the module has been tested with Parallel::Jobs and Parallel::ForkManager.

The above example is given as a simple illustration of the module's semantics. In a more practical setting (e.g. using Parallel::Jobs), the parent could launch a series of long-running jobs and continuously monitor all workers' output and / or process status, and respond to the results by spawning new workers, or sending a signal to a child by pid. The detach() and forget() methods could be called from the run_on_finish() callback of Parallel::Jobs, for example.

Only one pipe is created, to allow the parent to read the output from the child. Bidirectional IPC might be a nice enhancement, but is currently considered beyond the scope of this module. If you find this a drawback, you might be looking for a more robust solution, such as POE, which provides a full-featured event driven multitasking framework.

METHODS

new

instantiate your collection with an empty list of jobs

prepare($name, $cmd)

Add a job to your collection with the given name and command. The job is considered "ready" until it is attached or forgotten (see below).

$name can be used for tracking the task - for example Parallel::JobManager can use this identifier in its callbacks.

$cmd will be invoked by your code - so it can be anything you want to execute in a standardized way. Eg, within eval() or system() or using Parallel::Jobs::start_job.

Passing a $name to prepare() which has already been passed will not replace the current child (after all, it may be running!), but will return undef. See the forget method, below.

is_ready($name)

Returns 1 if the command name has been prepared but not yet attached, undef if there is no such name or if the process is running.

is_attached($name)

Looks for a running process with the given name and returns the pid if the process has been attached, or undef.

attach($name, $pid)

Associate the given name with the given pid and consider this child to be "alive" or running.

detach($pid)

Consider the child with the given pid to no longer be alive - change job state from attached to finished.

forget($name)

Delete the child with the given name entirely. This allows a new child to be created with a previously-used name, for example. Returns undef if the child is attached.

reset

Like calling new all over again: deletes all records.

get_child($name)

Return hashref to the record for the given name. The record consists of:

    id                  - the name identifying this unique child
    cmd                 - the command this child will run
    child_writer        - the write-end of the pipe
    parent_reader       - the read-end of the pipe

get_names

Returns an arrayref of all the names(the id field) for all attached children. See CAVEATS, below.

get_pids

Return a hashref of id => pid of all attached children.

get_readers

Return a hashref of id => parent_reader filehandles for all attached or finished children. Useful for iterating through all children to read their output.

get_all_ready

Return a hashref of all the prepared children which have not yet been attached. The hash keys are the names of each record (i.e. "id"), while the record itself also contains the id (as above), for consistency.

get_next_ready

The most useful way of iterating through the collection. Returns only one record for a ready child (i.e. prepared but not attached). Children are sorted according to the system's default sort() behaviour and the first record is returned. This does NOT pop() the record from the collection - you must call attach() for the child and provide its pid. Failure to do so while iterating in a while loop will continue infinitely.

CAVEATS

Undocumented getters are defined to directly access the object's data structures, but they are mainly for internal use. The methods described above should be sufficient for interacting with the module. The method names() should not be confused with get_names(). The former returns a hash of pid => name pairs, while the latter returns an array of job names (aka "id"). In either case, only attached processes are returned.

The method get_next_ready() is useful for iterating through the collection, but, unlike a true iterator, it does not traverse elements in the collection simply by calling it. For the `next ready' item to change, the item must be taken out of the `ready' pool using attach() (or forget()). For this reason, use caution - calling get_next_ready() without attaching or forgetting that process could cause an infinite loop.

This module was written primarily for POSIX systems. It may run on Win32, but has not been tested extensively on that platform. Feedback and patches are welcome.

SEE ALSO

The following may be helpful reading for users of this module, or considering doing so:

    perldoc perlipc

    perldoc perlfork

    Parallel::Jobs

    Parallel::ForkManager

    Parallel::Runner

    subs::Parallel

    http://perl.plover.com/FAQs/Buffering.html

The following may be of interest as alternatives to this module, or as part of a different approach to executing parallel jobs:

    POE

    Parallel::Simple

    Supervisor

    IPC::Run

    Proc::Launcher

    Parallel::Iterator

    Parallel::Workers

    Qudo::Parallel::Manager

COPYRIGHT

 (c) 2010, Kevin Semande
 This program is free software; you can redistribute it and/or
 modify it under the same terms as Perl itself.

AUTHOR

 Kevin Semande <perldev@26a.net>

 With thanks to Nadim Khemir and others for feedback and corrections.

2 POD Errors

The following errors were encountered while parsing the POD:

Around line 319:

You forgot a '=back' before '=head2'

Around line 410:

=back without =over