The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Getopt::Euclid - Executable Uniform Command-Line Interface Descriptions

VERSION

This document describes Getopt::Euclid version 0.2.0

SYNOPSIS

    use Getopt::Euclid;

    if ($ARGV{-i}) {
        print "Interactive mode...\n";
    }

    for my $x (0..$ARGV{-size}{h}-1) {
        for my $y (0..$ARGV{-size}{w}-1) {
            do_something_with($x, $y);
        }
    }

    __END__

    =head1 NAME

    yourprog - Your program here

    =head1 VERSION

    This documentation refers to yourprog version 1.9.4

    =head1 USAGE

        yourprog [options]  -s[ize]=<h>x<w>  -o[ut][file] <file>

    =head1 REQUIRED ARGUMENTS

    =over

    =item  -s[ize]=<h>x<w>    

    Specify size of simulation

    =for Euclid:
        h.type:    int > 0
        h.default: 24
        w.type:    int >= 10
        w.default: 80

    =item  -o[ut][file] <file>    

    Specify output file

    =for Euclid:
        file.type:    writable
        file.default: '-'

    =back

    =head1 OPTIONS

    =over

    =item  -i

    Specify interactive simulation

    =item  -l[[en][gth]] <l>

    Length of simulation [default: 99]

    =for Euclid:
        l.type:    int > 0
        l.default: 99

    =item --version

    =item --usage

    =item --help

    =item --man

    Print the usual program information

    =back

    Remainder of documentation starts here...

    =head1 AUTHOR

    Damian Conway (DCONWAY@CPAN.org)

    =head1 BUGS

    There are undoubtedly serious bugs lurking somewhere in this code.
    Bug reports and other feedback are most welcome.

    =head1 COPYRIGHT

    Copyright (c) 2005, Damian Conway. All Rights Reserved.
    This module is free software. It may be used, redistributed
    and/or modified under the terms of the Perl Artistic License
    (see http://www.perl.com/perl/misc/Artistic.html)
  
  

DESCRIPTION

Getopt::Euclid uses your program's own documentation to create a command-line argument parser. This ensures that your program's documented interface and its actual interface always agree.

To use the module, you simply write:

    use Getopt::Euclid;

at the top of your program.

When the module is loaded within a regular Perl program, it will:

  1. locate any POD in the same file,

  2. extract information from that POD, most especially from the =head1 REQUIRED ARGUMENTS and =head1 OPTIONS sections,

  3. build a parser that parses the arguments and options the POD specifies,

  4. remove the command-line arguments from @ARGV and parse them, and

  5. put the results in the global %ARGV variable (or into specifically named optional variables, if you request that -- see "Exporting Option Variables").

As a special case, if the module is loaded within some other module (i.e. from within a .pm file), it still locates and extracts POD information, but instead of parsing @ARGV immediately, it caches that information and installs an import() subroutine in the caller module. That new import() acts just like Getopt::Euclid's own import, except that it adds the POD from the caller module to the POD of the callee.

All of which just means you can put some or all of your CLI specification in a module, rather than in the application's source file. See "Module Interface" for more details.

INTERFACE

Program Interface

You write:

    use Getopt::Euclid;

and your command-line is parsed automagically.

There are no options to pass. Getopt::Euclid doesn't export anything. It just works.

Module Interface

You write:

    use Getopt::Euclid;

and your module will then act just like Getopt::Euclid (i.e. you can use your module instead of Getopt::Euclid>, except that your module's POD will also be prepended to the POD of any module that loads yours. In other words, you can use Getopt::Euclid in a module to create a standard set of CLI arguments, which can then be added to any application simply by loading your module.

To accomplish this trick Getopt::Euclid installs an import() subroutine in your module. If your module already has an import() subroutine defined, terrible things happen. So don't do that.

POD Interface

This is where all the action is.

When Getopt::Euclid is loaded in a non-.pm file, it searches that file for the following POD documentation:

=head1 NAME

Getopt::Euclid ignores the name specified here. In fact, if you use the standard --help, --usage, --man, or --version arguments (see "Standard arguments"), the module replaces the name specified in this POD section with the actual name by which the program was invoked (i.e. with $0).

=head1 USAGE

Getopt::Euclid ignores the usage line specified here. If you use the standard --help, --usage, or --man arguments, the module replaces the usage line specified in this POD section with a usage line that reflects the actual interface that the module has constructed.

=head1 VERSION

Getopt::Euclid extracts the current version number from this POD section. To do that it simply takes the first substring that matches <digit>.<digit> or <digit>_<digit>. It also accepts one or more additional trailing .<digit> or _<digit>, allowing for multi-level and "alpha" version numbers such as:

    =head1 VERSION
    
    This is version 1.2.3
    

or:

    =head1 VERSION
    
    This is alpha release 1.2_34
    
=head1 REQUIRED ARGUMENTS

Getopt::Euclid uses the specifications in this POD section to build a parser for command-line arguments. That parser requires that every one of the specified arguments is present in any command-line invocation. See "Specifying arguments" for details of the specification syntax.

The actual headings that Getopt::Euclid can recognize here are:

    =head1 [STD|STANDARD] REQUIRED [ARG|ARGUMENT][S]
=head1 OPTIONS

Getopt::Euclid uses the specifications in this POD section to build a parser for command-line arguments. That parser does not require that any of the specified arguments is actually present in a command-line invocation. Again, see "Specifying arguments" for details of the specification syntax.

Typically a program will specify both REQUIRED ARGUMENTS and OPTIONS, but there is no requirement that it supply both, or either.

The actual headings that Getopt::Euclid recognizes here are:

    =head1 [STD|STANDARD] OPTION[AL|S] [ARG|ARGUMENT][S]

Getopt::Euclid prints this section whenever the standard --version option is specified on the command-line.

The actual heading that Getopt::Euclid recognizes here is any heading containing any of the words "COPYRIGHT", "LICENCE", or "LICENSE".

Specifying arguments

Each required or optional argument is specified in the POD in the following format:

    =item ARGUMENT_STRUCTURE

    ARGUMENT_DESCRIPTION

    =for Euclid:
        ARGUMENT_OPTIONS
        PLACEHOLDER_CONSTRAINTS

Argument structure

  • Each argument is specified as an =item.

  • Any part(s) of the specification that appear in square brackets are treated as optional.

  • Any parts that appear in angle brackets are placeholders for actual values that must be specified on the command-line.

  • Any placeholder that is immediately followed by ... may be repeated as many times as desired.

  • Any whitespace in the structure specifies that any amount of whitespace (including none) is allowed at the same position on the command-line.

  • A vertical bar indicates the start of an alternative variant of the argument.

For example, the argument specification:

    =item -i[n] [=] <file> | --from <file>

indicates that any of the following may appear on the command-line:

    -idata.txt    -i data.txt    -i=data.txt    -i = data.txt
                                     
    -indata.txt   -in data.txt   -in=data.txt   -in = data.txt

    --from data.text

as well as any other combination of whitespacing.

Any of the above variations would cause all three of:

    $ARGV{'-i'}
    $ARGV{'-in'}
    $ARGV{'--from'}
    

to be set to the string 'data.txt'.

You could allow the optional = to also be an optional colon by specifying:

    =item -i[n] [=|:] <file>

Optional components may also be nested, so you could write:

    =item -i[n[put]] [=] <file>

which would allow -i, -in, and -input as synonyms for this argument and would set all three of $ARGV{'-i'}, $ARGV{'-in'}, and $ARGV{'-input'} to the supplied file name.

The point of setting every possible variant within %ARGV is that this allows you to use a single key (say $ARGV{'-input'}, regardless of how the argument is actually specified on the command-line.

Repeatable arguments

Normally Getopt::Euclid only accepts each specified argument once, the first time it appears in @ARGV. However, you can specify that an argument may appear more than once, using the repeatable option:

    =item file=<filename>

    =for Euclid:
        repeatable

When an argument is marked repeatable the corresponding entry of %ARGV will not contain a hash reference, but rather an array of hash references, each of which records each repetition.

Boolean arguments

If an argument has no placeholders it is treated as a boolean switch and it's entry in %ARGV will be true if the argument appeared in @ARGV.

For a boolean argument, you can also specify variations that are false, if they appear. For example, a common idiom is:

    =item --print

    Print results

    =item --noprint

    Don't print results

These two arguments are effectively the same argument, just with opposite boolean values. However, as specified above, only one of $ARGV{'--print'} and $ARGV{'--noprint'} will be set.

As an alternative you can specify a single argument that accepts either value and sets both appropriately:

    =item --[no]print

    [Don't] print results

    =for Euclid:
        false: --noprint

With this specification, if --print appears in @ARGV, then $ARGV{'--print'} will be true and $ARGV{'--noprint'} will be false. On the other hand, if --noprint appears in @ARGV, then $ARGV{'--print'} will be false and $ARGV{'--noprint'} will be true.

The specified false values can follow any convention you wish:

    =item [+|-]print

    =for Euclid:
        false: -print

or:

    =item -report[_no[t]]

    =for Euclid:
        false: -report_no[t]

et cetera.

Multiple placeholders

An argument can have two or more placeholders:

    =item -size <h> <w>

The corresponding command line argument would then have to provide two values:

    -size 24 80

Multiple placeholders can optionally be separated by literal characters (which must then appear on the command-line). For example:

    =item -size <h>x<w>

would then require a command-line of the form:

    -size 24x80

If an argument has two or more placeholders, the corresponding entry in %ARGV becomes a hash reference, with each of the placeholder names as one key. That is, the above command-line would set both $ARGV{'-size'}{'h'} and $ARGV{'-size'}{'w'}.

Optional placeholders

Placeholders can be specified as optional as well:

    =item -size <h> [<w>]

This specification then allows either:

    -size 24

or:

    -size 24 80

on the command-line. If the second placeholder value is not provided, the corresponding $ARGV{'-size'}{'w'} entry is set to undef. See also "Placeholder defaults".

Unflagged placeholders

If an argument consists of a single placeholder with no "flag" marking it:

    =item <filename>

then the corresponding entry in %ARG will have a key the same as the placeholder (including the surrounding angle brackets):

    if ($ARGV{'<filename>'} eq '-') {
        $fh = \*STDIN;
    }

The same is true for any more-complicated arguments that begin with a placeholder:

    =item <h> [x <w>]

The only difference in the more-complex cases is that, if the argument has any additional placeholders, the entire entry in %ARGV becomes a hash:

    my $total_size
        = $ARGV{'<h>'}{'h'} * $ARGV{'<h>'}{'w'}

Note that, as in earlier multi-placeholder examples, the individual second- level placeholder keys don't retain their angle-brackets.

Repeated placeholders

Any placeholder that is immediately followed by ..., like so:

    =item -lib <files>...

    =item <offsets>...

    =for Euclid:
        offsets.type: integer > 0

will match as many times as possible, but at least once. Note that this implies that an unconstrained repeated unflagged placeholder (see "Placeholder constraints" and "Unflagged placeholders") will consume the rest of the command-line, and so should be specified last in the POD.

If a placeholder is repeated, the corresponding entry in %ARGV will then be an array reference, with each individual placeholder match in a separate element. For example:

    for my $lib (@{ $ARGV{'-lib'} }) {
        add_lib($lib);
    }

    warn "First offset is: $ARGV{'<offsets>'}[0]";
    my $first_offset = shift @{ $ARGV{'<offsets>'} };

Placeholder constraints

You can specify that the value provided for a particular placeholder must satisfy a particular set of restrictions by using a =for Euclid block. For example:

    =item -size <h>x<w>

    =for Euclid:
        h.type: integer
        w.type: integer

specifies that both the <h> and <w> must be given integers. You can also specify an operator expression after the type name:

    =for Euclid:
        h.type: integer > 0
        w.type: number <= 100

specifies that <h> has to be given an integer that's greater than zero, and that <w> has to be given a number (not necessarily an integer) that's no more than 100.

These type constraints have two alternative syntaxes:

    PLACEHOLDER.type: TYPE BINARY_OPERATOR EXPRESSION

as shown above, and the more general:

    PLACEHOLDER.type: TYPE [, EXPRESSION_INVOLVING(PLACEHOLDER)]

Using the second syntax, you could write the previous constraints as:

    =for Euclid:
        h.type: integer, h > 0
        w.type: number,  w <= 100

In other words, the first syntax is just sugar for the most common case of the second syntax. The expression can be as complex as you wish and can refer to the placeholder as many times as necessary:

    =for Euclid:
        h.type: integer, h > 0 && h < 100
        w.type: number,  Math::is_prime(w) || w % 2 == 0

Note that the expressions are evaluated in the package main namespace, so it's important to qualify any subroutines that are not in that namespace. Furthermore, any subroutines used must be defined (or loaded from a module) before the use Getopt::Euclid statement.

Standard placeholder types

Getopt::Euclid recognizes the following standard placeholder types:

    Name            Placeholder value...        Synonyms
    ============    ====================        ================

    integer         ...must be an integer       int    i

    +integer        ...must be a positive       +int   +i
                    integer
                    (same as: integer > 0)

    0+integer       ...must be a positive       0+int  0+i
                    integer or zero
                    (same as: integer >= 0)

    number          ...must be an number        num    n

    +number         ...must be a positive       +num   +n
                    number
                    (same as: number > 0)

    0+number        ...must be a positive       0+num  0+n
                    number or zero
                    (same as: number >= 0)

    string          ...may be any string        str    s
                    (default type)

    readable        ...must be the name         input  in
                    of a readable file

    writeable       ...must be the name         writable output out
                    of a writeable file
                    (or of a non-existent
                    file in a writeable
                    directory)
                    
    /<regex>/       ...must be a string
                    matching the specified
                    pattern

Placeholder type errors

If a command-line argument's placeholder value doesn't satisify the specified type, an error message is automatically generated. However, you can provide your own message instead, using the .type.error specifier:

    =for Euclid:
        h.type:        integer, h > 0 && h < 100
        h.type.error:  <h> must be between 0 and 100 (not h)

        w.type:        number,  Math::is_prime(w) || w % 2 == 0
        w.type.error:  Can't use w for <w> (must be an even prime number)

Whenever an explicit error message is provided, any occurrence within the message of the placeholder's unbracketed name is replaced by the placeholder's value (just as in the type test itself).

Placeholder defaults

You can also specify a default value for any placeholders that aren't given values on the command-line (either because their argument isn't provided at all, or because the placeholder is optional within the argument).

For example:

    =item -size <h>[x<w>]

    Set the size of the simulation

    =for Euclid:
        h.default: 24
        w.default: 80

This ensures that if no <w> value is supplied:

    -size 20

then $ARGV{'-size'}{'w'} is set to 80.

Likewise, of the -size argument is omitted entirely, both $ARGV{'-size'}{'h'} and $ARGV{'-size'}{'w'} are set to their respective default values.

The default value can be any valid Perl compile-time expression:

    =item -pi=<pi value>

    =for Euclid:
        pi value.default: atan2(0,-1)

Argument cuddling

Getopt::Euclid allows any "flag" argument to be "cuddled". A flag argument consists of a single non- alphanumeric character, followed by a single alpha-numeric character:

    =item -v

    =item -x

    =item +1

    =item =z

Cuddling means that two or more such arguments can be concatenated after a single common non-alphanumeric. For example:

    -vx

Note, however, that only flags with the same leading non-alphanumeric can be cuddled together. Getopt::Euclid would not allow:

    -vxz

That's because cuddling is recognized by progressively removing the second character of the cuddle. In other words:

    -vxz

becomes:

    -v -xz

which becomes:

    -v -x z

which will fail, unless a z argument has also been specified.

On the other hand, if the argument:

    =item -e <cmd>

had been specified, the module would accept:

    -vxe'print time'

as a cuddled version of:

    -v -x -e'print time'

Exporting Option Variables

By default, the module only stores arguments into the global %ARGV hash. You can request that options are exported as variables into the calling package the special ':vars' specifier:

    use Getopt::Euclid qw( :vars );

That is, if your program accepts the following arguments:

    -v
    --mode <modename>
    <infile>
    <outfile>
    --auto-fudge <factor>      (repeatable)
    --also <a>...
    --size <w>x<h>

Then these variables will be exported

    $ARGV_v
    $ARGV_mode
    $ARGV_infile
    $ARGV_outfile
    @ARGV_auto_fudge
    @ARGV_also
    %ARGV_size          # With entries $ARGV_size{w} and $ARGV_size{h}

For options that have multiple variants, only the longest variant is exported.

The type of variable exported (scalar, hash, or array) is determined by the type of the corresponding value in %ARGV. Command-line flags and arguments that take single values will produce scalars, arguments that take multiple values will produce hashes, and repeatable arguments will produce arrays.

If you don't like the default prefix of "ARGV_", you can specify your own, such as "opt_", like this:

    use Getopt::Euclid qw( :vars<opt_> );

The major advantage of using exported variables is that any misspelling of argument variables in your code will be caught at compile-time by use strict.

Standard arguments

Getopt::Euclid automatically provides four standard arguments to any program that uses the module. The behaviours of these arguments are "hard- wired" and cannot be changed, not even by defining your own arguments of the same name.

The standard arguments are:

--usage

This argument cause the program to print a short usage summary and exit.

--help

This argument cause the program to print a longer usage summary (including a full list of required and optional arguments) and exit.

--man

This argument cause the program to print the complete POD documentation for the program and exit. If the standard output stream is connected to a terminal and the POD::Text module is available, the POD is formatted before printing. If the IO::Page or IO::Pager::Page module is available, the formatted documentation is then paged.

If standard output is not connected to a terminal or POD::Text is not available, the POD is not formatted.

--version

This argument causes the program to print the version number of the program (as specified in the =head1 VERSION section of the POD) and any copyright information (as specified in the =head1 COPYRIGHT POD section) and then exit.

Minimalist keys

By default, the keys of %ARGV will match the program's interface exactly. That is, if your program accepts the following arguments:

    -v
    --mode <modename>
    <infile>
    <outfile>
    --auto-fudge

Then the keys that appear in %ARGV will be:

    '-v'
    '--mode'
    '<infile>'
    '<outfile>'
    '--auto-fudge'

In some cases, however, it may be preferable to have Getopt::Euclid set up those hash keys without "decorations". That is, to have the keys of %ARGV be simply:

    'v'
    'mode'
    'infile'
    'outfile'
    'auto_fudge'

You can arrange this by loading the module with the special ':minimal_keys' specifier:

    use Getopt::Euclid qw( :minimal_keys );

Note that, in rare cases, using this mode may cause you to lose data (for example, if the interface specifies both a --step and a <step> option). The module throws an exception if this happens.

DIAGNOSTICS

Compile-time diagnostics

The following diagnostics are mainly caused by problems in the POD specification of the command-line interface:

Getopt::Euclid was unable to access POD

Something is horribly wrong. Getopt::Euclid was unable to read your program to extract the POD from it. Check your program's permissions, though it's a mystery how perl was able to run the program in the first place, if it's not readable.

.pm file cannot define an explicit import() when using Getopt::Euclid

You tried to define an import() subroutine in a module that was also using Getopt::Euclid. Since the whole point of using Getopt::Euclid in a module is to have it build an import() for you, supplying your own import() as well defeats the purpose.

Unknown specification: %s

You specified something in a =for Euclid section that Getopt::Euclid didn't understand. This is often caused by typos, or by reversing a placeholder.type or placeholder.default specification (that is, writing type.placeholder or default.placeholder instead).

Unknown type (%s) in specification: %s
Unknown .type constraint: %s

Both these errors mean that you specified a type constraint that Getopt::Euclid didn't recognize. This may have been a typo:

    =for Euclid
        count.type: inetger

or else the module simply doesn't know about the type you specified:

    =for Euclid
        count.type: complex

See "Standard placeholder types" for a list of types that Getopt::Euclid does recognize.

Invalid .type constraint: %s

You specified a type constraint that isn't valid Perl. For example:

    =for Euclid
        max.type: integer not equals 0

instead of:

    =for Euclid
        max.type: integer != 0
Invalid .default value: %s

You specified a default value that isn't valid Perl. For example:

    =for Euclid
        curse.default: *$@!&

instead of:

    =for Euclid
        curse.default: '*$@!&'
Invalid constraint: %s (No <%s> placeholder in argument: %s)

You attempted to define a .type constraint for a placeholder that didn't exist. Typically this is the result of the misspelling of a placeholder name:

    =item -foo <bar>

    =for Euclid:
        baz.type: integer

or a =for Euclid: that has drifted away from its argument:

    =item -foo <bar>

    =item -verbose

    =for Euclid:
        bar.type: integer
Getopt::Euclid loaded a second time

You tried to load the module twice in the same program. Getopt::Euclid doesn't work that way. Load it only once.

Unknown mode ('%s')

The only argument that a use Getopt::Euclid command accepts is ':minimal_keys' (see "Minimalist keys"). You specified something else instead (or possibly forgot to put a semicolon after use Getopt::Euclid).

Internal error: minimalist mode caused arguments '%s' and '%s' to clash

Minimalist mode removes certain characters from the keys hat are returned in %ARGV. This can mean that two command-line options (such as --step and <step>) map to the same key (i.e. 'step'). This in turn means that one of the two options has overwritten the other within the %ARGV hash. The program developer should either turn off ':minimal_keys' mode within the program, or else change the name of one of the options so that the two no longer clash.

Run-time diagnostics

The following diagnostics are caused by problems in parsing the command-line

Missing required argument(s): %s

At least one argument specified in the REQUIRED ARGUMENTS POD section wasn't present on the command-line.

Invalid %s argument. %s must be %s but the supplied value (%s) isn't.

Getopt::Euclid recognized the argument you were trying to specify on the command-line, but the value you gave to one of that argument's placeholders was of the wrong type.

Unknown argument: %s

Getopt::Euclid didn't recognize an argument you were trying to specify on the command-line. This is often caused by command-line typos or an incomplete interface specification.

CONFIGURATION AND ENVIRONMENT

Getopt::Euclid requires no configuration files or environment variables.

DEPENDENCIES

  • File::Spec::Functions

  • List::Util

INCOMPATIBILITIES

None reported.

BUGS AND LIMITATIONS

No bugs have been reported.

Please report any bugs or feature requests to bug-getopt-euclid@rt.cpan.org, or through the web interface at http://rt.cpan.org.

AUTHOR

Damian Conway <DCONWAY@cpan.org>

LICENCE AND COPYRIGHT

Copyright (c) 2005, Damian Conway <DCONWAY@cpan.org>. All rights reserved.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

DISCLAIMER OF WARRANTY

BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.

IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.