The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Data::All - Access to data in many formats source many places

WARNING! This is a preview release. Version 0.040 is the first version to remove the libraries Spiffy and IO::All. These changes are fresh and need more testing but I decided to update CPAN since the previous version 0.036 is broken. This is a preview release and should be treated as a novelty until the preview status is removed.

SYNOPSIS 1 (short)

    use Data::All;

        #   Create an instance of Data::All for database data
        my $input1 = Data::All->new(
            source => { path => 'sample.csv', profile => 'csv' },
            target   => { path => 'sample.tab',  profile => 'tab', ioconf  => ['file', 'w']}
        );

        #   $rec now contains an arrayref of hashrefs for the data defined in %db.
        my $rec  = $input1->read();

    #   Convert "source" to "target" and include the field names
    $input1->convert(print_fields => 1); 

    

SYNOPSIS 2 (long)

    use Data::All;
    
    my $dsn1     = 'DBI:mysql:database=mysql;host=YOURHOST;';
    my $dsn2     = 'DBI:Pg:database=SOMEOTHERDB;host=YOURHOST;';
    my $query1   = 'SELECT `Host`, `User`, `Password` FROM user';
    my $query2   = 'INSERT INTO users (`Password`, `User`, `Host`) VALUES(?,?,?)';
    
    my %db1 = 
    (   path        => [$dsn1, 'user', 'pass', $query1],
        ioconf      => ['db', 'r' ]
    );
    
    #   Notice how the parameters can be sent as a well-ordered arrayref
    #   or as an explicit hashref. 
    my %db2 = 
    (   path        => [$dsn2, 'user', 'pass', $query2],
        ioconf      => { type => 'db', perms => 'w' },
        fields      => ['Password', 'User', 'Host']
    );
    
    #   This is an explicit csv format. This is the same as using 
    #   profile => 'csv'. NOTE: the 'w' is significant as it is passed to 
    #   IO::All so it knows how to properly open and lock the file. 
    my %file1 = 
    (
        path        => ['/tmp/', 'users.csv'],
        ioconf      => ['plain', 'rw'],
        format      => {
            type    => 'delim', 
            breack  => "\n", 
            delim   => ',', 
            quote   => '"', 
            escape  => '\\',
        }
    );
    
    #   The only significantly different here is with_original => 1.
    #   This tells Data::All to include the original record as a field 
    #   value. The field name is _ORIGINAL. This is useful for processing
    #   data when auditing the original source is required.         
    my %file2 = 
    (
        path        => '/tmp/users.fixed',
        ioconf      => {type=> 'plain', perms => 'w', with_original => 1],
        format      => { 
            type    => 'fixed', 
            break   => "\n", 
            lengths => [32,16,64]
        },
        fields      => ['pass','user','host']
    );
    
    #   Create an instance of Data::All for database data.
    #   Note: parameters can also be a hash or hashref
    my $input2 = Data::All->new({
        source => %db1, 
        target => \%db2,
        print_fields => 0,              #   Do not output field name record
        atomic => 1                     #   Load the input completely before outputting
    });
    
    $input2->convert();                 #   Save the mysql data to the postgresql table 
    $input2->convert(target => \%file1);    #   And also save it to a CSV format
    $input2->convert(target => \%file2);    #   And also save it to a fixed format
    
    

DESCRIPTION

Data::All is based on a few abstracted concepts. The line is a record and a group of records is a collection. This allows a common record storing concept to be used across any number of data sources (delimited file, XML over a socket, a database table, etc...).

Supported formats: delimited and fixed (for filesystem types) Supported sources: local filesystem, database

Similar to AnyData, but more suited towards converting data types source and to various sources rather than reading data and playing with it. It is like an extension to IO::All which gives you access to data sources; Data::All gives you access to data.

Conversion now happens record by record by default. You can set this explicitly by sending atomic => 1 or 0 [default] through to new() or convert().

TODO LIST

Current major development areas are the interface and format stability. Upcoming development are breadth of features (more formats, more sources, ease of use, reliable subclassing, documentation/tests, and speed).

Misc: TODO:Allow a buffer to give some flexibility between record by record and atomic processing. TODO:Add ability to create temporary files TODO:Allow handling record fields with arrayrefs for anon / non-hash access TODO:Default values for fields (avoid undef db errors) TODO:Allow modifying data in memory and saving it back to a file TODO:Consider using a standard internal structure, so every source is converted into this structure (hash, Stone?) TODO:Add SQL as a readable input and output TODO:Expose format functions to Data::All users so simple single record conversion can be thoroughly utilized.

KNOWN BUGS

- The record separator does not currently work properly as it is hardcoded to be newline (for delimited and fixed formats). - The examples/* aren't always 100% in sync with the latest changes to Data::All. - If the first column is empty, it may screw up Data::All::Format::Delim (it will return undef for that column and the remaining columns with shift left)

AUTHOR

Delano Mandelbaum, <delano<AT>cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2009 by Delano Mandelbaum

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.3 or, at your option, any later version of Perl 5 you may have available.