The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Regexp::Fields - named capture groups

SYNOPSIS

  use Regexp::Fields qw(my);
  use strict;

  my $rx = qr/Time: (?<hrs>..):(?<min>..):(?<sec>..)/;
  if (/$rx/) {
      print "The time was: $&{hrs}:$&{min}:$&{sec}\n";
      # or: "The time was: $hrs:$min:$sec\n";
      # or: "The time was: $1:$2:$3\n";
  }

DESCRIPTION

Regexp::Fields adds the extended (?<name> ...) pattern to Perl's regular expression language. This works like an ordinary pair of capturing parens, but after a match you can use $&{name} instead of $1 (or whichever $N) to get at the captured substring.

The %{&} hash is global, like all punctuation variables. Like $1 and friends, it's dynamically scoped and bound to the "last match".

This looks familiar

The syntax is borrowed from the .NET regex library.

Differences from .NET include the following:

  • Regexp::Fields ignores whitespace between the field name and the subpattern. To match leading whitespace, you'll need to use backslash or a character class.

      /(?<space> [ ])/;   # matches one space
  • The digit variables aren't reordered.

      "12" =~ /(?<one>1)(2)/;    # $2 is "2"
  • Regexp::Fields doesn't support named backreferences (which are on the TODO list) or field names in conditional tests (which aren't).

Lexical variables and the my pragma

When a regex is compiled with use Regexp::Fields 'my' in effect, a lexical variable for each field will be implicitly declared. After a successful match the variables will be set to the captured substrings, just like the corresponding values of %{&}. After a failed match attempt they'll always be undef.

This is not the case with %{&} or the digit variables. After a failed match, those might refer to a regex in some other part of your program. The lexical match variables work differently because they are bound once and forever to the regex where they were declared.

  use Regexp::Fields qw(my);

  my $f = qr/(?<foo> foo)/;         # implicitly: my $foo
  my $b = qr/(?<bar> bar)/;         # implicitly: my $bar

  if (/$f/ and /$b/) {              # now $1 is "bar"
    print "Matched $foo and $bar";  # but $foo and $bar are both set!
  }

Which has some advantages, but comes with new drawbacks of its own.

First of all, Perl's lexical variables aren't visible until the statement after they're declared. This means you can't use the lexical "field" variables in (?{...}) or (??{...}) blocks, or on the replacement side of s///.

Second, this wouldn't have done the Right Thing:

  # [initialize $f and $b as above]

  if (/$f|$b/) {                 # WRONG
    print "Matched $f or $b";  
  }

When the two qr// variables are interpolated like this a new regex is compiled at runtime. The lexicals are still bound to $f and $b, and not to this new regex that combines them.

And third, this won't do what you want either:

  while (<>) {
    for my $p (@lists) {
      next unless /(?<pat> $p)/; # WRONG
      print "Matched: $pat\n";
    }
  }

Here the regex is compiled at run-time because of the interpolated $p variable and by then it's too late to declare the lexicals.

In all these cases you should use the dynamically-scoped %{&} instead.

Functions

install()

Install the modified regex engine.

uninstall()

Uninstall the modified regex engine.

DIAGNOSTICS

Sequence (?<name... not terminated

(F) You started a (?<name> ...) pattern but forgot the >.

Illegal character in (?<name> ...)

(F) Field names must start with a letter, and can contain only letters, numbers and underscores.

Field '%s' masks earlier declaration in same regex

(W) You used the same field name twice in a single regex. You can still access the first field with $DIGIT, but not with $&{name}.

"%s" variable %s masks earlier declaration in same "%s"

(W) With the my directive in effect, each field implicitly declares a lexical variable. See perldiag for a full description of the warning.

Identifier too long

(F) You used a field name longer than Perl allows for a simple identifier. See perldiag.

Sequence (?<%s...) not recognized

(F) You tried to compile a regex containing the (?<name> ...) extended pattern, but Regexp::Fields wasn't installed at the time. You can reinstall it at runtime with the install() function.

corrupted regex program

(F) You compiled a regex with Regexp::Fields installed, but tried to execute it with the standard regex engine. You can reinstall it at runtime with the install() function.

Warning: Use of '%s' without parens is ambiguous

(W) Since '%' is the modulo operator as well as the hash sigil, the parser suggests that keys %& could mean keys-modulo-and rather than keys-HASH. Likewise with each().

You can hush the warning by adding parentheses (i.e. keys(%&)) or curly braces (keys %{&}). See perldiag for a more complete description of this warning.

AUTHOR

Steve Grazzini (grazz@pobox.com)

THANKS

Thanks to Andrew Sterling Hanenkamp.

BUGS

Mail them to the author.

Known deficiencies include:

  • The 'my' pragma doesn't work in 5.6.1.

  • You need to reinstall the modified regex engine every time you create a new thread.

  • There's a scoping problem when /g is used with /m or /s.

COPYRIGHT AND LICENSE

Copyright (c) 2003, Steve Grazzini. All rights reserved.

This module is free software; you can copy, modify and/or redistribute it under the same terms as Perl itself.