The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

GRID::Machine::Process - Forking via SSH

SYNOPSIS

     use GRID::Machine;
     
     my $host = $ENV{GRID_REMOTE_MACHINE};
     my $machine = GRID::Machine->new( host => $host );
     
     my ($N, $np, $pi)  = (1000, 4, 0);
     for (0..$np-1) {
        $machine->fork( q{
            my ($id, $N, $np) = @_;
              
            my $sum = 0;
            for (my $i = $id; $i < $N; $i += $np) {
                my $x = ($i + 0.5) / $N;
                $sum += 4 / (1 + $x * $x);
            }
            $sum /= $N; 
         },
         args => [ $_, $N, $np ],
       );
     }
     
     $pi += $machine->waitall()->result for 1..$np;
     
     print "pi = $pi\n";

DESCRIPTION

The fork method

The fork method of GRID::Machine objects can be used to fork a process in the remote machine, as shown in the following example:

  $ cat -n fork5.pl 
     1  #!/usr/bin/perl -w
     2  use strict;
     3  use GRID::Machine;
     4  use Data::Dumper;
     5  
     6  my $host = $ENV{GRID_REMOTE_MACHINE};
     7  my $machine = GRID::Machine->new( host => $host );
     8  
     9  my $p = $machine->fork( q{
    10  
    11     print "stdout: Hello from process $$. args = (@_)\n";
    12     print STDERR "stderr: Hello from process $$\n";
    13  
    14     use List::Util qw{sum};
    15     return { s => sum(@_), args => [ @_ ] };
    16   },
    17   args => [ 1..4 ],
    18  );
    19  
    20  # GRID::Machine::Process objects are overloaded
    21  print "Doing something while $p is still alive  ...\n" if $p; 
    22  
    23  my $r = $machine->waitpid($p);
    24  
    25  print "Result from process '$p': ",Dumper($r),"\n";
    26  print "GRID::Machine::Process::Result objects are overloaded in a string context:\n$r\n";

When executed, the former program produces an output similar to this:

  $ perl fork5.pl 
  Doing something while 5220:5230:some.machine:5234:5237 is still alive  ...
  Result from process '5220:5230:some.machine:5234:5237': $VAR1 = bless( {
                   'machineID' => 0,
                   'stderr' => 'stderr: Hello from process 5237
  ',
                   'descriptor' => 'some.machine:5234:5237',
                   'status' => 0,
                   'waitpid' => 5237,
                   'errmsg' => '',
                   'stdout' => 'stdout: Hello from process 5237. args = (1 2 3 4)
  ',
                   'results' => [ { 'args' => [ 1, 2, 3, 4 ], 's' => 10 } ]
                 }, 'GRID::Machine::Process::Result' );

  GRID::Machine::Process::Result objects are overloaded in a string context:
  stdout: Hello from process 5237. args = (1 2 3 4)
  stderr: Hello from process 5237

The fork method returns a GRID::Machine::Process object. The first argument must be a string containing the code that will be executed by the forked process in the remote machine. Such code is always called in a list context. The fork method admits the following arguments:

  • stdin

    The name of the file to which stdin will be redirected

  • stdout

    The name of the file to which stdout will be redirected. If not specified a temporary file will be used

  • stderr

    The name of the file to which stderr will be redirected. If not specified a temporary file will be used

  • result

    The name of the file to which the result computed by the child process will be dumped. If not specified a temporary file will be used

  • args

    The arguments for the code executed by the remote child process

GRID::Machine::Process objects

GRID::Machine::Process objects have been overloaded. In a string context a GRID::Machine::Process object produces the concatenation hostname:clientPID:remotePID. In a boolean context it returns true if the process is alive and false otherwise. This way, the execution of line 21 in the program above:

    21  print "Doing something while $p is still alive  ...\n" if $p; 

produces an output like:

  Doing something while 5220:5230:some.machine:5234:5237 is still alive  ...

if the remote process is still alive. The descriptor of the process 5220:5230:some.machine:5234:5237 is a colon separated sequence of five components:

1 - The PID of the local process executing GRID::Machine
2 - The PID of the local process in charge of the connection with the remote machine
3 - The name of the remote machine
4 - The PID of the remote process executing GRID::Machine::REMOTE
5 - The PID of the child process created by fork

When evaluated in a boolean context, a GRID::Machine::Process returns 1 if it is alive and 0 otherwise.

The waitpid method

The waitpid method waits for the GRID::Machine::Process received as first argument to terminate. Additional FLAGS as in perl waitpid can be passed as arguments. It returns a GRID::Machine::Process::Result object, whose attributes contain:

  • stdout

    A string containing the output to STDOUT of the remote child process

  • stderr

    A string containing the output to STDERR of the remote child process

  • results

    The list of values returned by the child process. The forking code is always called in a list context.

  • status

    The value associated with $? as returned by the remote child process.

  • waitpid

    The value returned by the Perl waitpid function when synchronized with the remote child process. It is usually the value is either the pid of the deceased process, or -1 if there was no such child process. On some systems, a value of 0 indicates that there are processes still running.

  • errmsg

    The child error as in $@

  • machineID

    The logical identifier of the associated GRID::Machine. By default, 0 if it was the first GRID::Machine created, 1 if it was the second, etc.

The waitall method

It is similar to waitpid but instead waits for any child process.

Behaves like the wait(2) system call on your system: it waits for a child process to terminate and returns

  • The GRID::Machine::Process::Result object associated with the deceased process if it was called via the GRID::Machine fork method, or

  • The PID of the deceased process if there is no GRID::Machine::Process associated (it was called using an ordinary fork)

  • -1 if there are no child processes. Note that a return value of -1 could mean that child processes are being automatically reaped, as described in perlipc.

See an example:

  $ cat -n wait1.pl 
     1  #!/usr/bin/perl -w
     2  use strict;
     3  use GRID::Machine;
     4  use Data::Dumper;
     5  
     6  my $host = $ENV{GRID_REMOTE_MACHINE};
     7  my $machine = GRID::Machine->new( host => $host );
     8  
     9  my $p = $machine->fork( q{
    10  
    11     print "stdout: Hello from process $$. args = (@_)\n";
    12     print STDERR "stderr: Hello from process $$\n";
    13  
    14     use List::Util qw{sum};
    15     return { s => sum(@_), args => [ @_ ] };
    16   },
    17   args => [ 1..4 ],
    18  );
    19  
    20  # GRID::Machine::Process objects are overloaded
    21  print "Doing something while $p is still alive  ...\n" if $p; 
    22  
    23  my $r = $machine->waitall();
    24  
    25  print "Result from process '$p': ",Dumper($r),"\n";
    26  print "GRID::Machine::Process::Result objects are overloaded in a string context:\n$r\n";

When executed produces:

  $ perl wait1.pl 
  Doing something while 1271:1280:local:1284:1287 is still alive  ...
  Result from process '1271:1280:local:1284:1287': $VAR1 = bless( {
                   'machineID' => 0,
                   'stderr' => 'stderr: Hello from process 1287
  ',
                   'descriptor' => 'local:1284:1287',
                   'status' => 0,
                   'waitpid' => 1287,
                   'errmsg' => '',
                   'stdout' => 'stdout: Hello from process 1287. args = (1 2 3 4)
  ',
                   'results' => [ { 'args' => [ 1, 2, 3, 4 ], 's' => 10 } ]
                 }, 'GRID::Machine::Process::Result' );

  GRID::Machine::Process::Result objects are overloaded in a string context:
  stdout: Hello from process 1287. args = (1 2 3 4)
  stderr: Hello from process 1287

The following example uses the fork method and waitall to compute in parallel a numerical approach to the value of the number pi:

  $ cat -n waitpi.pl 
     1  #!/usr/bin/perl -w
     2  use strict;
     3  use GRID::Machine;
     4  
     5  my $host = $ENV{GRID_REMOTE_MACHINE};
     6  my $machine = GRID::Machine->new( host => $host );
     7  
     8  my ($N, $np, $pi)  = (1000, 4, 0);
     9  for (0..$np-1) {
    10     $machine->fork( q{
    11         my ($id, $N, $np) = @_;
    12           
    13         my $sum = 0;
    14         for (my $i = $id; $i < $N; $i += $np) {
    15             my $x = ($i + 0.5) / $N;
    16             $sum += 4 / (1 + $x * $x);
    17         }
    18         $sum /= $N; 
    19      },
    20      args => [ $_, $N, $np ],
    21    );
    22  }
    23  
    24  $pi += $machine->waitall()->result for 1..$np;
    25  
    26  print "pi = $pi\n";

The async method

The async method it is quite similar to the fork method but receives as arguments the name of a GRID::Machine method and the arguments for this method. It executes asynchronously the method. It returns a GRID::Machine::Process object. Basically, the call

            $m->async($subname => @args) 

is equivalent to:

            $m->fork($subname.'(@_)' args => [ @args ] ) 

The following example uses async to compute in parallel an approximation to the value of pi:

  $ cat -n async.pl 
     1  #!/usr/bin/perl -w
     2  use strict;
     3  use GRID::Machine;
     4  
     5  my $host = $ENV{GRID_REMOTE_MACHINE};
     6  my $machine = GRID::Machine->new( host => $host );
     7  
     8  $machine->sub(sumareas => q{
     9         my ($id, $N, $np) = @_;
    10           
    11         my $sum = 0;
    12         for (my $i = $id; $i < $N; $i += $np) {
    13             my $x = ($i + 0.5) / $N;
    14             $sum += 4 / (1 + $x * $x);
    15         }
    16         $sum /= $N; 
    17  });
    18  
    19  my ($N, $np, $pi)  = (1000, 4, 0);
    20  
    21  $machine->async( sumareas =>  $_, $N, $np ) for (0..$np-1);
    22  $pi += $machine->waitall()->result for 1..$np;
    23  
    24  print "pi = $pi\n";

GRID::Machine::Process::Result objects

In a string context a GRID::Machine::Process::Result object produces the concatenation of its output to STDOUT followed by its output to STDERR. In a boolean context it evaluates according to its result attribute. It evaluates to true if it is an array reference with more than one element or if the only element is true. Otherwise it is false.

About the pi Example Used in this Manual

In this manual we have used several times as an example the computation of an approach to number Pi (3.14159...) using numerical integration. To understand it, take into account that the area under the curve 1/(1+x**2) between 0 and 1 is Pi/4 = (3.1415...)/4 as it shows the following debugger session:

  pp2@nereida:~/public_html/cgi-bin$ perl -wde 0
  main::(-e:1):   0
    DB<1>  use Math::Integral::Romberg 'integral'
    DB<2> p integral(sub { my $x = shift; 4/(1+$x*$x) }, 0, 1);
  3.14159265358972

The module Math::Integral::Romberg provides the function integral that allow us to compute the area of a given function in some interval. In fact - if you remember your high school days - it is easy to see the reason: the integral of 4/(1+$x*$x) is 4*arctg($x) and so its area between 0 and 1 is given by:

        4*(arctg(1) - arctg(0)) = 4 * arctg(1) = 4 * Pi / 4 = Pi

This is not, in fact, a good way to compute Pi, but makes a good example of how to exploit several machines to fulfill a task.

To compute the area under 4/(1+$x*$x) we have divided up the interval [0,1] into sub-intervals of size 1/N and add up the areas of the small rectangles with base 1/N and height the value of the curve 4/(1+$x*$x) in the middle of the interval. The following debugger session illustrates the idea:

  pp2@nereida:~$ perl -wde 0
  main::(-e:1):   0
  DB<1> use List::Util qw(sum)
  DB<2> $N = 6
  DB<3> @divisions = map { $_/$N } 0..($N-1)
  DB<4> sub f { my $x = shift; 4/(1+$x*$x) }
  DB<5> @halves = map { $_+0.5/$N } @divisions
  DB<6> $area = sum(map { f($_)/$N } @halves)
  DB<7> p $area
  3.14390742722244

To optimize the execution time we distribute the sum in line 6 $area = sum(map { f($_)/$N } @halves) among several processes. The processes are numbered from 0 to np-1. Each process sums up the areas of roughly N/np intervals. We can spedup the computation if each process is allocated to a different processor (or core).

AUTHOR

Casiano Rodriguez Leon <casiano@ull.es>

ACKNOWLEDGMENTS

This work has been supported by CEE (FEDER) and the Spanish Ministry of Educacion y Ciencia through Plan Nacional I+D+I number TIN2005-08818-C04-04 (ULL::OPLINK project http://www.oplink.ull.es/). Support from Gobierno de Canarias was through GC02210601 (Grupos Consolidados). The University of La Laguna has also supported my work in many ways and for many years.

I wish to thank Paul Evans for his IPC::PerlSSH module: it was the source of inspiration for this module. To Alex White, Dmitri Kargapolov, Eric Busto and Erik Welch for their contributions. To the Perl Monks, and the Perl Community for generously sharing their knowledge. Finally, thanks to Juana, Coro and my students at La Laguna.

LICENCE AND COPYRIGHT

Copyright (c) 2007 Casiano Rodriguez-Leon (casiano@ull.es). All rights reserved.

These modules are free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.