The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

TITLE

DRAFT: Synopsis 19: Command Line Interface

AUTHORS

    Jerry Gay <jerry.gay@rakudoconsulting.com>

VERSION

    Created: 12 Dec 2008

    Last Modified: 26 Oct 2009
    Version: 26

This is a draft document. This document describes the command line interface. It has changed extensively from previous versions of Perl in order to increase clarity, consistency, and extensibility. Many of the syntax revisions are extensions, so you'll find that much of the Perl 5 syntax embedded in your muscle memory will still work.

Notable features described in the sections below include:

  • A smart default command-line processor in the core

  • All options have a long, descriptive name for increased clarity

  • Common options have a short, single-character name, and allow clustering

  • Extended option syntax provides the ability to set boolean true/false

  • New ++ metasyntax allows options to be passed through to subsystems

This interface to Perl 6 is special in that it occurs at the intersection of the program and the operating system's command line shell, and thus is not accessed via a consistent syntax everywhere. A few assumptions are made here, which will hopefully stand the test of time: All command-line arguments are assumed to be in Unicode unless proven otherwise; and Perl is born of Unix, and as such the syntax presented in this document is expected to work in a Unix-style shell. To explore the particularities of other operating systems, see Synopsis 25 (TBD).

Command Line Elements

The command line is broken down into two basic elements: a program, and arguments. Each command line element is whitespace separated, so elements containing whitespace must be quoted. The program processes the arguments and performs the requested actions. It looks something like /usr/bin/perl6, parrot perl6.pbc, or rakudo, and is followed by zero or more arguments. Perl 6 does not do any processing of the program portion of the command line, but it is made available at run-time via the PROCESS::<$PROGRAM_NAME> variable.

Command line arguments are broken down into options and values. Each option may take zero or more values. After all options have been processed, the remaining values (if any) generally consist of the name of a script for Perl to execute, followed by arguments for that script. If no values remain, Perl 6 implicitly opens STDIN to read the script. If you wish to pass arguments to a script read from STDIN, you must specify STDIN by name (- on most operating systems).

Backward (In)compatibility

You may find yourself typing your favorite Perl 5 options, even after Christmas has arrived. As you'll see below, common options are provided which behave similarly. Less common options, however, may not be available or may have changed syntax. If you provide Perl with unrecognized command-line syntax, Perl gives you a friendly error message. If the unrecognized syntax is a valid Perl 5 option, Perl provides helpful suggestions to allow you to perform the same action using the current syntax.

Unchanged Syntactic Features

Several features have not changed from Perl 5, including:

  • The most common options have a single-character short name

  • Single-character options may be clustered with the same syntax and semantics

  • Many command-line options behave similarly, for example:

      Option...                            Still means...
      -a                                   Autosplit
      -c                                   Check syntax
      -e *line*                            Execute
      -F *expression*                      Specify autosplit field separator
      -h                                   Display help and exit
      -I *directory*[,*directory*[,...]]   Add include paths
      -n                                   Act like awk
      -p                                   Act like sed
      -S                                   Search PATH for script
      -T                                   Enable taint mode
      -v                                   Display version info
      -V                                   Display verbose config info

    All of these options have extended syntax, and some may have slightly different semantics, so see "Option Reference" below for the details.

Removed Syntactic Features

Some Perl 5 command-line features are no longer available, either because there's a new and different way to do it in Perl 6, or because they're no longer relevant. Here's a breakdown of what's been removed:

-0 *octal/hex*

Sets input record separator. Missing due to lack of specification in Synopsis 16. There is a comment about this in the "Notes" section at the end of this document.

-C *number/list*

Control unicode features. Perl 6 has unicode semantics, and assumes a UTF-8 command-line interface (until proven otherwise, at which point this functionality may be readdressed).

-d, -dt, -d:foo, -D, etc.

Debugging commands. Replaced with the ++BUG metasyntactic option.

-E *line*

Execute a line of code, with all features enabled. This is specific to Perl 5.10, and not relevant to Perl 6, where -e performs this function.

-i *extension*

Modify files in-place. Haven't thought about it enough to add yet, but I'm certain it has a strong following. {{TODO review decision here}}

-l

Enable automatic line-ending processing. This is the default behavior.

-M *module*, -m *module*, etc.

use/no module. Replaced by --use.

-P

Obsolete. Removed.

-s

Enable rudimentary switch parsing. By default, Perl 6 parses the arguments passed to a script using the signature supplied by the user in the MAIN routine (see "Declaring a MAIN subroutine" in S06-subroutines).

-t

Enable taint warnings mode. Taint mode needs more thought, but it's much more likely that the -T switch will take options rather than use a second command-line flag to implement related behavior.

-u

Obsolete. Removed.

-U

Allow unsafe operations. This is extremely dangerous and infrequently used, and doesn't deserve its own command-line option.

-w

Enable warnings. This is the default behavior.

-W

Enable all warnings. This is infrequently used, and doesn't deserve its own command-line option.

-X

Disable all warnings. This is infrequently used, and doesn't deserve its own command-line option.

Options and Values

Command line options are parsed using the following rules:

  • Options must begin with one of the following symbols: --, -, or :.

  • Options are case sensitive. -o and -O are not the same option.

  • All options have a multi-character, descriptive name for increased clarity. Multi-character option names always begin with -- or :.

  • Common options have a short, one-character name for speed. Single-character names always begin with -.

  • Single-character options may be clustered. -ab means -a -b. When a single-character option which requires a value is clustered, the option may appear only in the final position of the cluster.

  • Options may be negated with /, for example --/name, :/name, -/n. Negated single-character options cannot appear in a cluster. In practice, negated options are rare anyway, as most boolean options default to False.

  • Option names follow Perl 6 identifier naming convention, except ' is not allowed, and single-character options may be any character or number.

  • The special option -- signals the parser to stop option processing. Arguments following a bare -- (with no identifier) are always parsed as a list of values, even if they look like valid options.

Delimited options allow you to transparently pass one or more options through to a subsystem, as specified by the special options that delimit those options. They are parsed according to the following rules:

  • The opening and closing delimiters begin with two or more plus characters, for example ++. You'll usually use two plus characters, but more are allowed to avoid ambiguity when nesting delimited options.

  • Opening and closing delimited option names follow option identifier naming convention, defined above.

  • If the closing delimiter is omitted, the rest of the command line is consumed.

  • Inside a delimited option, the -- option does not suppress searching for the closing delimiter. That is, only the rest of the arguments within the delimiters are treated as values.

  • Eager matching semantics are used, so the first closing delimiter found completes the match.

  • Delimited options cannot be negated. However, the final delimiter takes a slash indicating the termination of the delimited processing, much like a closing HTML tag.

These options are made available in dynamic variables matching their name, and are invisible to MAIN() except as %*OPTname. For example:

  ++PARSER --setting=Perl6-autoloop-no-print ++/PARSER

is available inside your script as %*OPTPARSER, and contains --setting=Perl6-autoloop-no-print. Since eager matching is used, if you need to pass something like:

  ++foo -bar ++foo baz ++/foo ++/foo

you'll end up with

  %*OPTS<foo> = '-bar ++foo baz';

which is probably not what you wanted. Instead, add extra + characters

  +++foo -bar ++foo baz ++/foo +++/foo

which will give you

  %*OPTS<foo> = '-bar ++foo baz ++/foo';

allowing you to properly nest delimited options.

The actual storage location of %*OPTS may be either in PROCESS::<%OPTS> or GLOBAL::<%OPTS>, depending on how the process sets up its interpreters.

Values are parsed with the following rules:

  • Values are passed to options with the following syntax --option=value or --option value.

  • Values containing whitespace must be enclosed in quotes, for example -O="spacey value"

  • Multiple values are passed using commas without intervening whitespace, as in --option=val1,'val 2',etc, or by specifying multiple instances of the option, as in --option=val1 --option='val 2'.

Remaining arguments

Any remaining arguments to the Perl6 program are placed in the @*ARGS array.

Option Reference

Perl 6 options, descriptions, and services.

Synopsis

  multi sub perl6(
    Bool :a($autoloop-comb),
    Bool :c($check-syntax),
    Bool :$doc,
         :e($execute),
         :F($autoloop-delim),
    Bool :h($help),
         :I(@include),
         :L($language),
    Bool :n($autoloop-no-print),
         :O($output-format),
    Bool :p($autoloop-print),
    Bool :S($search-path),
    Bool :T($taint),
         :u($use),
    Bool :v($version),
    Bool :V($verbose-config),
    Bool :x($extract-from-text),
  );

Reference

--autoloop-comb, -a

When used with -n or -p, implicitly combs input and assigns the result to @_ within the loop produced by the -n or -p.

An alternate pattern for comb may be specified with --autoloop-delim, a.k.a. -F.

++CMD --command-line-parser *parser* ++/CMD

Add a command-line processor. When this option is parsed, it immediately triggers an action that affects or replaces the command-line parser. Therefore, it is a good idea to put this option as early as possible in the argument list.

--check-syntax, -c

Check syntax, then exit. Desugars to -e 'CHECK{ compiles_ok(); exit; }'.

--doc

Lookup Perl documentation in Pod format. Desugars to -e 'CHECK{ compiles_ok(); dump_perldoc(); }'. $*ARGS contains the arguments passed to perl6, and is available at CHECK time, so dump_perldoc() can respond to command-line options.

{{TODO may create a ++DOC subsystem here. also, may use -d for short name, even though it clashes with perl 5}}

++BUG [*switches*, *flags*] ++/BUG

Set switches and flags for the debugger.

Note: The debugger needs further specification.

--execute, -e *line*

Execute a single-line program. Multiple -e options may be chained together, each one representing an input line with an implicit newline at the end.

If you wish to run in lax mode, without strictures and warnings enabled, pass a value of '6;' to the first -e on the command line, like -e '6;'. See Synopsis 11 for details.

--autoloop-delim, -F *expression*

Pattern to split on (used with -a). Substitutes an expression for the default split function, which is {split ' '}. Accepts unicode strings (as long as your shell lets you pass them). Allows passing a closure (e.g. -F "{use Text::CSV}"). Awk's not better any more :)

--help, -h

Print summary of options. Desugars to ++CMD --print-help --exit ++/CMD.

--include, -I *directory*[,*directory*[,...]]

Prepend directories to @*INC, for searching ad hoc libraries. Searching the standard library follows the policies laid out in Synopsis 11.

--language, -L *dsl*

Set the domain specific language for parsing the script file. (That is, specify the setting (often known as the prelude) for the program.) ++PARSER --setting=*dsl* ++/PARSER.

--autoloop-no-print, -n

Act like awk. Desugars to ++PARSER --setting=Perl6-autoloop-no-print ++/PARSER.

--output-format, -O *format*

Emit compiler output to STDOUT in the specified format, rather than invoking the compiled code immediately. This option is implementation-specific, so consult the documentation for your Perl 6 implementation for further details.

--autoloop-print, -p

Act like sed. Desugars to ++PARSER --setting=Perl6-autoloop-print ++/PARSER.

--search-path, -S

Use PATH environment variable to search for script specified on command-line.

--taint, -T

Turns on "taint" checking. See Synopsis 23 for details. Commits very early. Put this option as early on the command-line as possible.

--use, -u *module*

--use *module* and -u *module* desugars to -e 'use *module*'. Specify version info and import symbols by appending info to the module name:

  -u'Sense:auth<cpan:JRANDOM>:ver<1.2.1> <common @horse>'

You'll need the quotes so your shell doesn't complain about redirection. There is no special command-line syntax for 'no *module*, use -e.

--version, -v

Display program name, version, patchlevel, etc. Desugars to ++CMD -v ++/CMD ++PARSER -v ++/PARSER ++BUG -v ++/BUG.

--verbose-config, -V

Display configuration details. Desugars to ++CMD -V ++/CMD ++PARSER -V ++/PARSER ++BUG -V ++/BUG.

--extract-from-text, -x

Run program embedded in Unicode text. Scan for the first line starting with #! and containing the word perl, and start there instead. This is useful for running a program embedded in a larger message. (In this case you would indicate the end of the program using the =END block, as defined in Synopsis 26.)

Desugars to --PARSER --Perl6-extract-from-text --/PARSER.

Metasyntactic Options

Metasyntactic options are a subset of delimited options used to pass arguments to an underlying component of Perl. Perl itself does not parse these options, but makes them available to run-time components via the %*META-ARGS dynamic variable.

Standard in Perl 6 are three underlying components, CMD, PARSER, and BUG. Implementations may expose other components via this interface, so consult the documentation for your Perl 6 implementation.

  On command line...                   Subsystem gets...
   ++X a -b  ++/X                      a -b

  # Nested options
  +++X a -b   ++X -c ++/X -d e +++/X   a -b ++X -c ++/X -d e

  # More than once (both are valid, but the second form is preferred)
   ++X a -b  ++/X -c  ++X -d e  ++/X   a -b -d e
  +++X a -b +++/X -c  ++X -d e  ++/X   a -b -d e

Environment Variables

Environment variables may be used to the same effect as command-line arguments.

PATH

Used in executing subprocesses, and for finding the program if the -S switch is used.

PERL6LIB

A list of directories in which to look for ad hoc Perl library files.

Note: this is speculative, as library loading is not yet specified, except insofar as S11 mandates various behaviors incompatible with mere directory probing.

PERL6OPT

Default command-line arguments. Arguments found here are prepended to the list of arguments provided on the command-line.

References

http://perldoc.perl.org/perlrun.html
http://search.cpan.org/~jv/Getopt-Long-2.37/lib/Getopt/Long.pm
http://search.cpan.org/~dconway/Getopt-Euclid-v0.2.0/lib/Getopt/Euclid.pm
http://perlcabal.org/syn/S06.html#Declaring_a_MAIN_subroutine
http://search.cpan.org/src/AUDREYT/Perl6-Pugs-6.2.13/docs/Pugs/Doc/Run.pod
http://haskell.org/ghc/docs/latest/html/users_guide/using-ghc.html
http://java.sun.com/j2se/1.4.2/docs/tooldocs/windows/java.html

Notes

I'd like to be able to adjust the input record separator from command line, for instance to specify the equivalent of perl 5's $/ = \32768;. So far, I don't have a solution, but perhaps pass a closure that evaluates to an Int? This should try to use whatever option does the same thing to a new filehandle when S16 is further developed.

Sandboxing? maybe -r

Env var? maybe -E. Could be posed in terms of substituting a different setting.