The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Text::Filter - base class for objects that can read and write text lines

SYNOPSIS

A plethora of tools exist that operate as filters: they get data from a source, operate on this data, and write possibly modified data to a destination. In the Unix world, these tools can be chained using a technique called pipelining, where the output of one filter is connected to the input of another filter. Some non-Unix worlds are reported to have similar provisions.

To create Perl modules for filter functionality seems trivial at first. Just open the input file, read and process it, and write output to a destination file. But for really reusable modules this approach is too simple. A reusable module should not read and write files itself, but rely on the calling program to provide input as well as to handle the output.

Text::Filter is a base class for modules that have in common that they process text lines by reading from some source (usually a file), manipulating the contents and writing something back to some destination (usually some other file).

This module can be used on itself, but it is most powerfull when used to derive modules from it. See section EXAMPLES for an extensive example.

DESCRIPTION

The main purpose of the Text::Filter class is to abstract out the details out how input and output must be done. Although in most cases input will come from a file, and output will be written to a file, advanced modules require more detailed control over the input and output. For example, the module could be called from another module, in this case the callee could be allowed to process only a part of the input. Or, a program could have prepared data in an array and wants to call the module to process this data as if it were read from a file. Also, the input stream provides a pushback functionality to make peeking at the input easy.

Text::Filter can be used on its own as a convenient input/output handler. For example:

    use Text::Filter;
    my $filter = Text::Filter->(input => *STDIN, output => *STDOUT);
    my $line;
    while ( defined($line = $filter->readline) ) {
        $filter->writeline($line);
    }

Or, even simpler:

    use Text::Filter;
    Text::Filter->run(input => *STDIN, output => *STDOUT);

Its real power shows when such a program is turned into a module for optimal reuse.

When creating a module that is to process lines of text, it can be derived from Text::Filter, for example:

    package MyFilter;
    use base 'Text::Filter';

The constructor method must then call the new() method of the Text::Filter class to set up the base class. This is conveniently done by calling SUPER::new(). A hash containing attributes must be passed to this method, some of these attributes will be used by the base class setup.

    sub new {
        my $class = shift;
        # ... fetch non-attribute arguments from @_ ...
        # Create the instance, using the attribute arguments.
        my $self = $class->SUPER::new(@_);

Finally, the newly created object must be re-blessed into the desired class, and returned:

        # Rebless into the desired class.
        bless($self, $class);
    }

When creating new instances for this class, attributes input and output can be used to specify how input and output is to be handled. Several possible values can be supplied for these attributes.

For input:

  • A scalar, containing a file name. The named file will be opened, input lines will be read using <>.

  • A file handle (glob). Lines will be read using <>.

  • An instance of class IO::File. Lines will be read using <>.

  • A reference to an array. Input lines will be shift()ed from the array.

  • A reference to a scalar. Input lines will be taken from the contents of the scalar (which will be modified). When exhausted, it will be set to undefined.

  • A reference to an anonymous subroutine. This routine will be called to get the next line of data.

The default is to read input using de <> operator.

For output:

  • A scalar, containing a file name. The named file will be created automatically, output lines will be written using print().

  • A file handle (glob). Lines will be written using print().

  • An instance of class IO::File. Lines will be written using print().

  • A reference to an array. Output lines will be push()ed into the array. The array will be initialised to () if necessary.

  • A reference to a scalar. Output lines will be appended to the scalar. The scalar will be initialised to "" if necessary.

  • A reference to an anonymous subroutine. This routine will be called to append a line of text to the destination.

The default is to write output to STDOUT.

Additional attributes can be used to specify actions to be performed after the data is fetched, or prior to being written. For example, to strip line endings upon input, and add them upon output.

CONSTRUCTOR

The constructor is called new() and takes a hash with attributes as its parameter.

The following attributes are recognized and used by the constructor, all others are ignored.

The constructor will return a blessed hash containing all the original attributes, plus some new attributes. The names of the new attributes all start with _filter_, the new attributes should not be touched.

input

This designates the input source. The value must be a scalar (containing a file name), a file handle (either a glob or an instance of class IO::File), an array reference, or a reference to a subroutine, as described above.

If a subroutine is specified, it must return the next line to be processed, and undef at end.

input_postread

This attribute can be used to select an action to be performed after the data has been read. Its prime purpose is to handle line endings (e.g. remove a trailing newline).

The value can be 'none' or 0 (no action), 'chomp' or 1 (standard chomp() operation), or a reference to a subroutine. Default value is 0 (no chomping).

If the value is a reference to a subroutine, this will be called with the text line that was just read as its only argument, and it must return the new contents of the text line.. If it returns undef, this line will be skipped.

filter

If specified, a reference to a subroutine that performs filtering. It will be called after input_postread, with the text line that was just read as its only argument, and it must return the new contents of the text line. If it returns undef, this line will be skipped.

output

This designates the output. The value must be a scalar (containing a file name), a file handle (either a glob or an instance of class IO::File), or a reference to a subroutine, as described above.

Note: when a file name is passed, a > will be prepended if necessary.

output_prewrite

This attribute can be used to select an action to be performed just before the data is added to the output. Its prime purpose is to handle line endings (e.g. add a trailing newline). The value can be 'none' or 0 (no action) , 'newline' or 1 (append the value of $/ to the line), or a reference to a subroutine. Default value is 0 (no action).

If the value is 'newline' or 1, and the value of $/ is "" (paragraph mode), two newlines will be added.

If the value is a reference to a subroutine, this will be called with the text line as its only argument, and it must return the new contents of the line to be output. If it returns undef, no output occurs.

CLASS METHODS

Text::Filter->run([ attributes ])

This creates a temporary filter object using the attibutes as in new, and runs its run method.

INSTANCE METHODS

$filter->readline

If there is anything in the pushback buffer, this is returned and the pushback buffer is marked empty.

Otherwise, returns the next line from the input stream, or undef if there is no more input.

$filter->pushback($line)

Pushes a line of text back to the input stream. Returns the line.

$filter->peek

Peeks at the input. Short for pushback(readline()).

$filter->writeline ($line)

Adds $line to the output stream.

$filter->set_input($input [ , $postread ])

Sets the input method to $input. If the optional argument $postread is defined, sets the input line postprocessing strategy as well.

$filter->set_output($output, [ $prewrite ])

Sets the output method to $output. If the optional argument $prewrite is defined, sets the output line preprocessing strategy as well.

$filter->run( [ filter ])

This will run the readline/writeline loop. Optionally a filter argument (see CONSTRUCTOR, above) can be passed if filtering is desired and not yet otherwise designated.

EXAMPLE

This example shows how to filter empty and whitespace lines.

    use Text::Filter;
    Text::Filter->run(filter => sub { my $line = shift;
                                      return unless $line =~ /\S/;
                                      return $line;
                                  });

This is an example of how to use Text::Filter as a base class.

It implements a module that provides a single instance method: grep(), that performs some kind of grep(1)-style function (how surprising!).

A class method grepper() is also provided for easy access to do 'the right thing' in the most common case.

    package Grepper;

    use strict;
    use base qw(Exporter Text::Filter);
    our @EXPORT;

    # Setup.
    BEGIN {
        @EXPORT = qw(grepper);
    }

    # Constructor. Major part of the job is done by the superclass.
    sub new {
        my $class = shift;

        # Create a new instance by calling the superclass constructor.
        my $self = $class->SUPER::new(@_);
        # The superclass constructor will take care of handling
        # the input and output attributes, and setup everything for
        # handling the IO.

        # Bless the object into the desired class.
        bless ($self, $class);

        # And return it.
        $self;
    }

    # Instance method, just an example. No magic.
    sub grep {
        my $self = shift;
        my $pat = shift;
        my $line;
        while ( defined($line = $self->readline) ) {
            $self->writeline($line) if $line =~ $pat;
        }
    }

    # Class method, for convenience.
    # Usage: grepper (<input file>, <output file>, <pattern>);
    sub grepper {
        my ($input, $output, $pat) = @_;

        # Create a Grepper object.
        my $grepper = Grepper->new(input => $input, output => $output);

        # Call its grep method.
        $grepper->grep ($pat);
    }

AUTHOR AND CREDITS

Johan Vromans (jvromans@squirrel.nl) wrote this module.

COPYRIGHT AND DISCLAIMER

This program is Copyright 1998,2013 by Squirrel Consultancy. All rights reserved.

This program is free software; you can redistribute it and/or modify it under the terms of either: a) the GNU General Public License as published by the Free Software Foundation; either version 1, or (at your option) any later version, or b) the "Artistic License" which comes with Perl.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See either the GNU General Public License or the Artistic License for more details.