The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Filter::Heredoc - Search and filter embedded here documents

VERSION

Version 0.01

SYNOPSIS

    use 5.010;
    use Filter::Heredoc qw( hd_getstate hd_init hd_labels );
    use Filter::Heredoc::Rule qw( hd_syntax );
    
    my $line;
    my %state;
    
    # Get the defined labels to compare with the returned state
    my %label = hd_labels();

    # Read a file line-by-line and print only the here document
    while (defined( $line = <DATA> )) {
        %state = hd_getstate( $line ); 
        print $line if ( $state{statemarker} eq $label{heredoc} );
        if ( eof ) {
            close( ARGV ); 
            hd_init(); # Prevent state errors to propagate to next file
        }
    }

    # Test a line (is this an opening delimiter line?)
    $line = q{cat <<END_USAGE};
    %state = hd_getstate( $line ); 
    print "$line\n" if ( $state{statemarker} eq $label{ingress} );
    
    # Load a syntax helper rule (shell script is built in)
    hd_syntax ( 'pod' );

DESCRIPTION

This is the core module for Filter::Heredoc. If you're not looking to extend or alter the behavior of this module, you probably want to look at filter-heredoc instead.

Filter::Heredoc provides subroutines to search and print here documents. Here documents (also called "here docs") allow a type of input redirection from some following text. This is often used to embed short text messages (or configuration files) within shell scripts.

This module extracts here documents from POSIX IEEE Std 1003.1-2008 compliant shell scripts. Perl have derived a similar syntax but is at the same time different in many details.

Rules can be added to enhance here document extraction, i.e. prevent "false positives". Filter::Heredoc::Rule exports an additional subroutine to load and unload rules.

This version supports a basic POD rule. Current subroutines can be tested on Perl scripts if the code constructs use a near POSIX form of here documents. With that said don't rely on the current version for Perl since it's still in a very early phase of development.

Concept to parse here documents.

This is a line-by-line state machine design. Reading from the beginning to the end of a script results in following state changes:

    Source --> Here document --> Source
    

What tells a source line from a here document line apart? Nothing! However if adding an opening and closing delimiter state and tracking previous state we can identify what is source and what's a here document:

    Source --> Ingress --> Here document --> Egress --> Source

In reality there are few more state changes defined by POSIX. An example of this is the script below and with added state labels:

    S]   #!/bin/bash --posix
    I]   cat <<eof1; cat <<eof2
    H]   Hi,
    E]   eof1
    H]   Helene.
    E]   eof2
    S]

Naturally, when bash runs this only the here document is printed:

    Hi,
    Helene.

SUBROUTINES

Filter::Heredoc exports following subroutines only on request.

    hd_getstate   # returns a label based on the argument (text line)
    hd_labels     # reads out and (optionally) define new labels
    hd_init       # flushes the internal state machine
    

Filter::Heredoc::Rule exports one subroutine to load and unload syntax rules.

    hd_syntax             # load/unload a script syntax rule

hd_getstate

This routine determines the new state, based on last state and the new text line in the argument.

    %state = hd_getstate( $line );
    

Returns a hash with following keys/values:

    statemarker :      Holds a label that represent the state of the line.
    
    blockdelimiter:    Holds the delimiter which belongs to a 'region'.
    
    is_tabremovalflag: If the redirector had a trailing minus this
                       value is true for the actual line.

A here document 'region' is defined as all here document lines being bracketed by the ingress (opening delimiter) and the egress (terminating delimiter) line. This region may or may not have a file unique delimiter.

To prevent unreliable results, only pass a text line as an argument. Use file test operators if reading input lines from a file:

    if ( -T $file ) {
      print "$file 'looks' like a plain text file to me.\n";
    }

This function throws exceptions on a few fatal internal errors. These are trappable. See ERRORS below for messages printed.

hd_labels

Gets or optionally sets a new unique label for the four possible states.

    %label = hd_labels();
    %label = hd_labels( %newlabel );

The hash keys defines the default internal label assignments.

    %label = (
            source  => 'S',
            ingress => 'I',
            heredoc => 'H',
            egress  => 'E',
    );
  

Returns a hash with the current label assignment.

hd_init

Sets the internal state machine to 'source' and empties all internal state arrays.

    hd_init();

When reading more that one file, call this function before next file to prevent any state faults to propagate to next files input. Now always returns an $EMPTY_STR (q{}) but this may change to indicate an state error from previous files.

ERRORS

hd_getstate throws following exceptions.

  • undef

    If the text line argument is undef following message, including a full trace back, is printed.

        Passed argument to function is undef.
        Can't determine state from an undef argument.
        

    Ensure that only a plain text line is supplied as an argument.

  • Invalid ingress state change

    If the state machine conclude a change was from Ingress to Ingress following message, including a full trace back, is printed:

        Current state is Ingress, and passed line say we shall change
        to Ingress again. Not allowed change i.e. Ingress --> Ingress
        

    If this happens, please report this as a BUG and how to reproduce.

  • Invalid egress state change

    If the state machine conclude a change was from Egress to Egress following, including a full trace back, message is printed:

        Current state is Egress, and passed line say we shall change
        to Egress again. Not allowed change i.e. Egress --> Egress.
        

    If this happens, please report this as a BUG and how to reproduce.

DEPENDENCIES

Filter::Heredoc only requires Perl 5.10 (or any later version).

AUTHOR

Bertil Kronlund, <bkron at cpan.org>

BUGS AND LIMITATIONS

Filter::Heredoc complies with *nix POSIX shells here document syntax. Non-compliant shells on e.g. MSWin32 platform is not supported.

Please report any bugs or feature requests to http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Filter-Heredoc or at <bug-filter-heredoc at rt.cpan.org>.

SEE ALSO

Overview of here documents and its usage: http://en.wikipedia.org/wiki/Here_document

The IEEE Std 1003.1-2008 standards can be found here: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html

Filter::Heredoc::Rule(3), filter-heredoc(1)

Filter::Heredoc::Cookbook(3) discuss e.g. how to embed POD as here documents in shell scripts to carry their own documentation.

LICENSE AND COPYRIGHT

Copyright 2011, Bertil Kronlund

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.