PBS::Client - Perl interface to submit jobs to Portable Batch System (PBS).
# Load this module use PBS::Client; # Create a client object my $client = PBS::Client->new(); # Specify the job my $job = PBS::Client::Job->new( queue => <job queue>, mem => <memory requested>, ..... cmd => <command list in array reference> ); # Optionally, re-organize the commands to a number of batched $job->pack(numQ => <number of batch>); # Submit the job $client->qsub($job);
This module provides a Perl interface to submit jobs to the Portable Batch System (PBS) server. PBS is a software allocating recources of a network to batch jobs. This module lets you submit jobs on the fly.
To submit jobs by PBS::Client, you need to prepare two objects: the client object and the job object. The client object connects to the server and submits jobs (described by the job object) by the method qsub.
qsub
The job object specifies various properties of a job (or a group of jobs). Properties that can be specified includes job name, CPU time, memory, priority, job inter-dependency and many others.
This module attempts to adopt the same philosophy of Perl, of which it tries to understand what you want to do and gives you the least surprise. Therefore, you usually can do the same thing with more than one way. This is also a reason that makes this document lengthy.
Three basic steps:
my $client = PBS::Client->new();
my $job = PBS::Client::Job->new(cmd => ['pwd', 'date']);
qsub()
$client->qsub($job);
There are other methods and options of the client object and job object. Most of the options are optional. When omitted, default values would be used. The only must option is cmd which tells the client object what commands to be submitted.
cmd
$pbs = PBS::Client->new( server => $server # PBS server name (optional) );
Client object is created by the new method. The name of the PBS server can by optionally supplied. If it is omitted, the default server is assumed.
new
Job (as a job object) is submitted to PBS by the method qub.
qub
my $pbsid = $pbs->qsub($job_object);
An array reference of PBS job ID would be returned.
$job = PBS::Client::Job->new( # Job declaration options wd => $wd, # working directory, default: cwd name => $name, # job name, default: pbsjob.sh script => $script, # job script name, default: pbsjob.sh shell => $shell, # shell path, default: /bin/sh account => $account, # account string # Resources options partition => $partition, # partition queue => $queue, # queue begint => $begint, # beginning time host => $host, # host used to execute nodes => $nodes, # execution nodes, default: 1 ppn => $ppn, # process per node pri => $pri, # priority nice => $nice, # nice value mem => $mem, # requested total memory pmem => $pmem, # requested per-process memory vmem => $vmem, # requested total virtual memory pvmem => $pvmem, # requested per-process virtual memory cput => $cput, # requested total CPU time pcput => $pcput, # requested per-process CPU time wallt => $wallt, # requested wall time # IO options stagein => $stagein, # files staged in stageout => $stageout, # files staged out ofile => $ofile, # standard output file efile => $efile, # standard error file maillist => $mails, # mail address list mailopt => $options, # mail options, combination of a, b, e # Command options vars => {%name_values}, # name-value of env variables cmd => [@commands], # command to be submitted prev => { # job before ok => $job1, # successful job before this job fail => $job2, # failed job before this job start => $job3, # started job before this job end => $job4, # ended job before this job }, next => { # job follows ok => $job5, # next job after this job succeeded fail => $job6, # next job after this job failed start => $job7, # next job after this job started end => $job8, # next job after this job ended }, # Job tracer options tracer => $on, # job tracer, either on / off (default) tfile => $tfile, # tracer report file );
Two points may be noted:
$job = PBS::Client::Job->new(cmd => [@commands]);
is equivalent to
$job = PBS::Client::Job->new(); $job->cmd([@commands]);
Full path of the working directory, i.e. the directory where the command(s) is executed. The default value is the current working directory.
Job name. It can have 15 or less characters. It cannot contain space and the first character must be alphabetic. If not specified, it would follow the script name.
Filename prefix of the job script to be generated. The PBS job ID would be appended to the filename as the suffix.
Example: script => test.sh would generate a job script like test.sh.12345 if the job ID is '12345'.
script => test.sh
The default value is pbsjob.sh.
pbsjob.sh
Thsi option lets you to set the shell path. The default path is /bin/sh.
/bin/sh
Account string. This is meaningful if you need to which account you are using to submit the job.
Partition name. This is meaningful only for the clusters with partitions. If it is omitted, default value will be assumed.
Queue of which jobs are submitted to. If omitted, default queue would be used.
The date-time at which the job begins to queue. The format is either "[[[[CC]YY]MM]DD]hhmm[.SS]" or "[[[[CC]YY-]MM-]DD] hh:mm[:SS]".
Example:
$job->begint('200605231448.33'); # or equilvalently $job->begint('2006-05-23 14:48:33');
This feature is in Experimental phase. It may not be supported in later versions.
You can specify the host on which the job will be run.
Nodes used. It can be an integer (declaring number of nodes used), string (declaring which nodes are used), array reference (declaring which nodes are used), and hash reference (declaring which nodes, and how many processes of each node are used).
Examples:
Integer
nodes => 3
means that three nodes are used.
String / array reference
# string representation nodes => "node01 + node02" # array representation nodes => ["node01", "node02"]
means that nodes "node01" and "node02" are used.
Hash reference
nodes => {node01 => 2, node02 => 1}
means that "node01" is used with 2 processes, and "node02" with 1 processes.
Maximum number of processes per node. The default value is 1.
Priority of the job in queueing. The higher the priority is, the shorter is the queueing time. Priority must be an integer between -1024 to +1023 inclusive. The default value is 0.
Nice value of the job during execution. It must be an integer between -20 (highest priority) to 19 (lowest). The default value is 10.
Maximum physical memory used by all processes. Unit can be b (bytes), w (words), kb, kw, mb, mw, gb or gw. If it is omitted, default value will be used. Please see also pmem, vmem and pvmem.
Maximum per-process physical memory. Unit can be b (bytes), w (words), kb, kw, mb, mw, gb or gw. If it is omitted, default value will be used. Please see also mem, vmem and pvmem.
Maximum virtual memory used by all processes. Unit can be b (bytes), w (words), kb, kw, mb, mw, gb or gw. If it is omitted, default value will be used. Please see also mem, pmem and pvmem.
Maximum virtual memory per processes. Unit can be b (bytes), w (words), kb, kw, mb, mw, gb or gw. If it is omitted, default value will be used. Please see also mem, pmem and vmem.
Maximum amount of total CPU time used by all processes. Values are specified in the form [[hours:]minutes:]seconds[.milliseconds].
$job->cput('00:30:00');
refers to 30 minutes of CPU time. Please see also pcput and wallt.
Maximum amount of per-process CPU time. Values are specified in the form [[hours:]minutes:]seconds[.milliseconds]. Please see also cput and wallt.
Maximum amount of wall time used. Values are specified in the form [[hours:]minutes:]seconds[.milliseconds]. Please see also cput and pcput.
Specify which files are need to stage (copy) in before the job starts. It may be a string, array reference or hash reference. For example, to stage in from01.file and from02.file in the remote host "fromMachine" and rename to01.file and to02.file respectively, following three representation are equilvalent:
String
stagein => "to01.file@fromMachine:from01.file,". "to02.file@fromMachine:from02.file"
Array
stagein => ['to01.file@fromMachine:from01.file', 'to02.file@fromMachine:from02.file']
Hash
stagein => {'fromMachine:from01.file' => 'to01.file', 'fromMachine:from02.file' => 'to02.file'}
Specify which files are need to stage (copy) out after the job finishs. Same as stagein, it may be string, array reference or hash reference.
stagein
stageout => "from01.file@toMachine:to01.file,". "from02.file@toMachine:to02.file"
stageout => ['from01.file@toMachine:to01.file', 'from02.file@toMachine:to02.file']
stageout => {'from01.file' => 'toMachine:to01.file', 'from02.file' => 'toMachine:to02.file'}
Path of the file for standard output. The default filename is like jobName.o12345 if the job name is 'jobName' and its ID is '12345'. Please see also efile.
Path of the file for standard error. The default filename is like jobName.e12345 if the job name is 'jobName' and its ID is '12345'. Please see also ofile.
This option declares who (the email address list) will receive mail about the job. The default is the job owner. The situation that the server will send email is set by the mailopt option shown below.
mailopt
For more than one email addresses, maillist can be either a comma separated string or a array reference.
maillist
This option declares under what situation will the server send email. It can be any combination of a, which indicates that mail will be sent if the job is aborted, b, indicating that mail will be sent if the job begins to run, and e, which indicates that mail will be sent if the job finishes. For example,
a
b
e
mailopt => "b, e" # or lazily, mailopt => "be"
means that mail will be sent when the job begins to run and finishes.
The default is a.
This option lets you expand the environment variables exported to the job. It can be a string, array reference or hash reference.
Example: to export the following variables to the job:
A B = b C D = d
you may use one of the following ways:
vars => "A, B=b, C, D=d",
Array reference
vars => ['A', 'B=b', 'C', 'D=d']
vars => {A => '', B => 'b', C => '', D => 'd'}
Mixed
vars => ['A', 'C', {B => 'b', D => 'd'}]
Command(s) to be submitted. It can be an array (2D or 1D) reference or a string. For 2D array reference, each row would be a separate job in PBS, while different elements of the same row are commands which would be executed one by one in the same job. For 1D array, each element is a command which would be submitted separately to PBS. If it is a string, it is assumed that the string is the only one command which would be executed.
2D array reference
cmd => [["./a1.out"], ["./a2.out" , "./a3.out"]]
means that a1.out would be excuted as one PBS job, while a2.out and a3.out would be excuted one by one in another job.
a1.out
a2.out
a3.out
1D array reference
cmd => ["./a1.out", "./a2.out"]
means that a1.out would be executed as one PBS job and a2.out would be another. Therefore, this is equilvalent to
cmd => [["./a1.out", "./a2.out"]]
cmd => "./a.out"
means that the command a.out would be executed. Equilvalently, it can be
a.out
cmd => [["./a.out"]] # as a 2D array # or cmd => ["./a.out"] # as a 1D array.
Hash reference which declares the job(s) executed beforehand. The hash can have four possible keys: start, end, ok and fail. start declares job(s) which has started execution. end declares job(s) which has already ended. ok declares job(s) which has finished successfully. fail declares job(s) which failed. Please see also next.
start
end
ok
fail
$job1->prev({ok => $job2, fail => $job3})
means that $job1 is executed only after $job2 exits normally and job3 exits with error.
$job1
$job2
job3
Hash reference which declares the job(s) executed later. The hash can have four possible keys: start, end, ok and fail. start declares job(s) after started execution. end declares job(s) after finished execution. ok declares job(s) after finished successfully. fail declares job(s) after failure. Please see also prev.
$job1->next({ok => $job2, fail => $job3})
means that $job2 would be executed after $job1 exits normally, and otherwise job3 would be executed instead.
Trace when and where the job was executing. It takes value of either on or off (default). If it is turned on, an extra tracer report file would be generated. It records when the job started, where it ran, when it finished and how long it used.
Path of the tracer report file. The default filename is like jobName.t12345 if the job name is 'jobName' and its ID is '12345'. Please see also ofile and efile.
Return the PBS job ID(s) of the job(s). It returns after the job(s) has submitted to the PBS. The returned value is an integer if cmd is a string. If cmd is an array reference, the reference of the array of ID will be returned. For example,
$pbsid = $job->pbsid;
pack is used to rearrange the commands among different queues (PBS jobs). Two options, which are numQ and cpq can be set. numQ specifies number of jobs that the commands will be distributed. For example,
pack
numQ
cpq
$job->pack(numQ => 8);
distributes the commands among 8 jobs. On the other hand, the cpq (abbreviation of command per queue) option rearranges the commands such that each job would have specified commands. For example,
$job->pack(cpq => 8);
packs the commands such that each job would have 8 commands, until no command left.
Job objects can be copied by the copy method:
copy
my $new_job = $old_job->copy;
The new job object ($new_job) is identical to, but independent of the original job object ($old_job).
$new_job
$old_job
copy can also specify number of copies to be generated. For example,
my @copies = $old_job->copy(3);
makes three identical copies.
Hence, the following two statements are the same:
my $new_job = $old_job->copy; my ($new_job) = $old_job->copy(1);
You want to run a.out of current working directory in the default queue:
use PBS::Client; my $pbs = PBS::Client->new; my $job = PBS::Client::Job->new( cmd => './a.out', );
You need to submit a list of commands to PBS. They are stored in the Perl array @jobs. You want to execute them one by one in a single CPU:
@jobs
use PBS::Client; my $pbs = PBS::Client->new; my $job = PBS::Client::Job->new( cmd => [\@jobs], ); $pbs->qsub($job);
You have 3 groups of commands, stored in @jobs_a, @jobs_b, @jobs_c. You want to execute each group in different CPU:
@jobs_a
@jobs_b
@jobs_c
use PBS::Client; my $pbs = PBS::Client->new; my $job = PBS::Client::Job->new( cmd => [ \@jobs_a, \@jobs_b, \@jobs_c, ], ); $pbs->qsub($job);
You have 3 groups of commands, stored in @jobs_a, @jobs_b, @jobs_c. You want to re-organize them to 4 batches:
use PBS::Client; my $pbs = PBS::Client->new; my $job = PBS::Client::Job->new( cmd => [ \@jobs_a, \@jobs_b, \@jobs_c, ], ); $job->pack(numQ => 4); $pbs->qsub($job);
You have 3 batches of commands, stored in @jobs_a, @jobs_b, @jobs_c. You want to re-organize them such that each batch has 4 commands:
use PBS::Client; my $pbs = PBS::Client->new; my $job = PBS::Client::Job->new( cmd => [ \@jobs_a, \@jobs_b, \@jobs_c, ], ); $job->pack(cpq => 4); $pbs->qsub($job);
You want to customize the requested resources rather than using the default ones:
use PBS::Client; my $pbs = PBS::Client->new; my $job = PBS::Client::Job->new( account => <account name>, partition => <partition name>, queue => <queue name>, wd => <working directory of the commands>, name => <job name>, script => <name of the generated script>, pri => <priority>, mem => <memory>, cput => <maximum CPU time>, wallt => <maximum wall clock time>, prologue => <prologue script>, epilogue => <epilogue script>, cmd => <commands to be submitted>, ); $pbs->qsub($job);
You want to run a1.out. Then run a2.out if a1.out finished successfully; otherwise run a3.out and a4.out.
a4.out
use PBS::Client; my $pbs = PBS::Client->new; my $job1 = PBS::Client::Job->new(cmd => "./a1.out"); my $job2 = PBS::Client::Job->new(cmd => "./a2.out"); my $job3 = PBS::Client::Job->new(cmd => ["./a3.out", "./a4.out"]); $job1->next({ok => $job2, fail => $job3}); $pbs->qsub($job1);
If you want to execute a single command, you need not write script. The simplest way is to use the script run in this package. For example,
run "./a.out --debug > a.dat"
would submit the job executing the command "a.out" with option "--debug", and redirect the output to the file "a.dat".
The options of the job object, such as the resource requested can be edited by
run -e
The more detail manual can be viewed by
run -m
Data::Dumper, File::Temp
This module has only been tested with OpenPBS in Linux. However, it was written to fit in as many Unix-like OS with PBS installed as possible.
qstat
Please email to kwmak@cpan.org for bug report or other suggestions.
PBS offical website http://www.openpbs.com,
PBS
Ka-Wai Mak <kwmak@cpan.org>
Copyright (c) 2006-2007, 2010-2011 Ka-Wai Mak. All rights reserved.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install PBS::Client, copy and paste the appropriate command in to your terminal.
cpanm
cpanm PBS::Client
CPAN shell
perl -MCPAN -e shell install PBS::Client
For more information on module installation, please visit the detailed CPAN module installation guide.