DBIx::TableLoader - Easily load a database table from a data set
version 1.003
my $dbh = DBI->connect(@connection_args); DBIx::TableLoader->new(dbh => $dbh, data => $data)->load(); # interact with new database table full of data in $dbh
This module tries to provide a fast and simple (but very configurable) interface for taking a set of data and loading it into a database table.
Common uses would be to take data from a file (like a CSV) and load it into a SQLite table. (For that specific case see DBIx::TableLoader::CSV.)
In most cases simply calling load() is sufficient, but all methods are documented below for completeness.
load()
Create a new instance. Accepts a hash or hashref of options.
This module is very configurable but tries to use good defaults in the hopes that you won't need to configure too much in most cases.
Most likely needed options:
dbh - A database handle as returned by DBI->connect()
dbh
DBI->connect()
data - An arrayref of arrayrefs of data (which will be the input records)
data
See "OPTIONS" for the full list.
Returns a hashref of the options defined by the base class and their default values.
Returns a hashref of additional options defined by a subclass.
my $columns = $loader->columns; # [ ['column1', 'data type'], ['column two', 'data type'] ]
Returns an arrayref of the columns. Each element is an arrayref of column name and column data type.
my $column_names = $loader->column_names; # ['column1', 'column two']
Returns an arrayref of the column names.
Executes a CREATE TABLE SQL statement on the database handle.
CREATE TABLE
Generates the opening of the CREATE TABLE statement (everything before the column specifications).
Defaults to "CREATE $table_type TABLE $quoted_name (".
"CREATE $table_type TABLE $quoted_name ("
Generates the SQL for the CREATE TABLE statement by concatenating "create_prefix", the column definitions, and "create_suffix".
Can be overridden in the constructor.
Generates the closing of the CREATE TABLE statement (everything after the column specifications).
Defaults to ")".
")"
Returns the default (base) name for the table.
This is mostly for subclasses where a useful table name can be determined from the input (like a filename). In this module it defaults to 'data'.
'data'
This gets concatenated together with "name_prefix" and "name_suffix" in "name".
Columns that have not been given an explicit data type will be defined using the default_column_type.
default_column_type
You can pass a value explicitly to the constructor, or it will try to determine an appropriate (string) type based on the database driver (using "default_sql_data_type").
If all else fails it will default to text (which works for SQLite, PostgreSQL, MySQL, and some others).
text
Passed to "type_info" in DBI to query the database driver for an appropriate default column type.
Defaults to DBI::SQL_LONGVARCHAR.
DBI::SQL_LONGVARCHAR
This method goes through the columns and converts any scalar column name to an arrayref of column name and default_column_type. It modifies itself and returns nothing. It is called automatically from the constructor.
columns
Execute the DROP TABLE statement on the database handle.
DROP TABLE
Returns the portion of the SQL statement before the table name.
Defaults to DROP TABLE.
Generates the SQL for the DROP TABLE statement by concatenating "drop_prefix", "quoted_name", and "drop_suffix".
Alternatively drop_sql can be set in the constructor if you need something more complex.
drop_sql
Returns the portion of the SQL statement after the table name.
Nothing by default.
Subclasses will override this method according to the input data format.
This is called from "get_row" to retrieve the next row of raw data.
It should return undef when there are no more rows.
undef
my $row = $loader->get_row();
Returns a single row of data at a time (as an arrayref). This method will be called repeatedly until it returns undef. The returned arrayref will be flattened and passed to "execute" in DBI.
This is called from "get_row" when a row is determined to be invalid (when "validate_row" throws an error).
If handle_invalid_row was not specified in the constructor this method is a no-op: the original row will be returned (and eventually passed to "execute" in DBI).
handle_invalid_row
Possible values for the handle_invalid_row option:
die - Calls die() with the error message
die
die()
warn - Calls warn() with the error message and returns the row unmodified
warn
warn()
code ref
If it's a subroutine reference it is called as a method, receiving the loader object, the error message, and the row:
$handler->($loader, $error, $row);
The handler should either die to cease processing, return false to skip this row and get the next one, or return a (possibly modified) row that will be passed to "execute" in DBI.
This allows you to, for example, write to a log when a bad row is found without aborting your transaction:
handle_invalid_row => sub { my ($self, $error, $row) = @_; $logger->log(['Bad row: %s: %s', $error, $row]); return; # return false to skip this row and move to the next one }
Generate the INSERT SQL statement that will be passed to "prepare" in DBI.
INSERT
Execute an INSERT statement on the database handle for each row of data. It will call "prepare" in DBI using "insert_sql" and then call "execute" in DBI once for each row returned by "get_row".
my $number_of_rows = $loader->load();
Load data into database table. This is a wrapper that does the most commonly needed things in a single method call. If the transaction setting is true (default) the actions will be wrapped in a transaction.
transaction
"drop" (if configured)
"create" (if configured)
"insert_all"
Returns the number of rows inserted.
Returns the full table name (concatenation of name_prefix, name, and name_suffix).
name_prefix
name
name_suffix
This method is called from "new" after the object is blessed (obviously). Any preparation work specific to the type of data should be done here.
This is mostly a hook for subclasses and does very little in this module.
Returns the full, quoted table name. Passes catalog, schema, and name attributes to "quote_identifier" in DBI.
catalog
schema
my $quoted_names = $loader->quoted_column_names(); # ['"column1"', '"column two"']
Returns an arrayref of column names quoted by the database driver.
Called from "get_row" to check that the provided row is valid.
It may die for any error which will be caught in "get_row" and the error will be passed to "handle_invalid_row".
The return value works like that of "handle_invalid_row": On success, the valid row (possibly modified) should be returned. If a false value is returned "get_row" will attempt to get another row.
Currently this only checks that the number of fields in the row matches the number of columns expected, however other checks may be added in the future. Subclasses can overwrite this to define their own validations (though calling the original (superclass method) is recommended).
This module is very [excessively] configurable. In most cases the default values will be sufficient, but you should be able to customize the object to fit your needs.
Frequently Used Options:
columns - Arrayref of column definitions
Each element can be an arrayref of column name and data type or just a string for the column name and "default_column_type" will be used. If not passed in the first row of data will be assumed to be column names.
columns => ['first_name', 'last_name', ['last_seen', 'date']]
This module probably isn't useful without one.
data - An arrayref of arrayrefs of data to populate the table;
Subclasses may define more appropriate options and ignore this parameter. If you're using this base class, you'll probably need this (unless you provide your own get_row coderef).
get_row
data => [ ['polar', 'bear', '2010-08-15'], ['blue', 'duck', '2009-07-30'] ]
Less common options that are available when you desire extra tweaking power:
create - Boolean; Whether or not to perform the CREATE TABLE statement
create
Defaults to true.
default_column_type - Default data type for each column
This will be used for each column that does not explicitly define a data type. The default will be determined from the database driver using default_sql_data_type. See "default_column_type".
default_sql_data_type
default_column_type => 'CHAR(50)'
drop - Boolean to execute a DROP TABLE statement before CREATE TABLE
drop
Defaults to false. Set it to true if the named table already exists and you want to recreate it.
get_row - A sub (coderef) that will override "get_raw_row"
You can use this if your input data is in a different format than the module expects (to split a string into an arrayref, for instance). This is called like a method (the object will be $_[0]). The return value will be passed to map_rows if both are present.
$_[0]
map_rows
# each record is a line from a log file; # use the m// operator in list context to capture desired fields get_row => sub { my $s = <$io>; [ $s =~ m/^(\d+)\s+"([^"]+)"\s+(\S+)$/ ] }
NOTE: If you use get_row and don't pass data you will probably want to pass columns (otherwise columns will be taken from the first call to get_row).
NOTE
grep_rows - A sub (coderef) to determine if a row should be used or skipped
grep_rows
Named after the built in grep function. It will receive the row as an arrayref in $_[0]. (The row will also be available in $_ for consistency with the built in grep.) The object will be passed as $_[1] in case you want it. If it returns a true value the row will be used. If it returns false the next row will be fetched and the process will repeat (until all rows have been exhausted).
grep
$_
$_[1]
grep_rows => sub { $_->[1] =~ /something/ } # accept the row if it matches grep_rows => sub { my ($row, $obj) = @_; do_something(); } # 2 variables
handle_invalid_row - How to handle invalid rows.
Can be die, warn, or a sub (coderef). See "handle_invalid_row" for more details. Default is to ignore (in which case DBI will likely error).
map_rows - A sub (coderef) to filter/mangle a row before use
Named after the built in map function. It will receive the row as an arrayref in $_[0]. (The row will also be available in $_ for consistency with the built in map.) The object will be passed as $_[1] in case you want it. It should return an arrayref (which will be used as the row).
map
map_rows => sub { [ map { uc $_ } @$_ ] } # uppercase all the fields map_rows => sub { my ($row, $obj) = @_; do_something(); } # 2 variables
name - Table name
Defaults to 'data'. Subclasses may provide a more useful default.
table_type - String that will go before TABLE in "create_prefix"
table_type
TABLE
A useful value might be TEMPORARY or TEMP. This is probably database driver dependent, so use an appropriate value.
TEMPORARY
TEMP
transaction - Boolean
All the operations in "load" will be wrapped in a transaction by default. Set this option to false to disable this.
Options that will seldom be necessary but are available for completeness and/or consistency:
catalog - Table catalog
Passed to "quote_identifier" in DBI to get the full, quoted table name. None by default.
create_prefix - The opening of the SQL statement
create_prefix
See "create_prefix". Overwrite if you need something more complex.
create_sql - The CREATE TABLE statement
create_sql
See "create_sql". Overwrite if you need something more complex.
create_suffix - The closing of the SQL statement
create_suffix
See "create_suffix". Overwrite if you need something more complex.
default_sql_data_type - Default SQL standard data type
If default_column_type is not supplied it will be determined by asking the database driver for a type corresponding to DBI::SQL_LONGVARCHAR. Alternate values can be passed (DBI::SQL_VARCHAR() for instance). See "default_sql_data_type".
DBI::SQL_VARCHAR()
drop_prefix - The opening of the SQL statement
drop_prefix
See "drop_prefix". Overwrite if you need something more complex.
drop_sql - The DROP TABLE statement
Will be constructed if not provided. See "drop_sql".
drop_suffix - The closing of the SQL statement
drop_suffix
See "drop_suffix". Overwrite if you need something more complex.
name_prefix - String prepended to table name
Probably mostly useful in subclasses where name is determined automatically.
name_suffix - String appended to table name
quoted_name - Full table name, properly quoted
quoted_name
Only necessary if you need something more complicated than $dbh->quote_identifier($catalog, $schema, $table) (see "quote_identifier" in DBI).
$dbh->quote_identifier($catalog, $schema, $table)
schema - Table schema
This module was designed to be subclassed for use with specific data input formats.
DBIx::TableLoader::CSV is a prime example. It is the entire reason this base module was designed.
Subclasses will likely want to override the following methods:
"defaults" - a hashref of additional acceptable options (and default values)
"default_name" - if you can determine a good default name from the input
"get_raw_row" - to return the next row of data
"prepare_data" - to initialize your object/data (open the file, etc.)
Be sure to check out the code for DBIx::TableLoader::CSV. Also see a very simple example in t/subclass.t.
It seemed frequent that I would find a data set that was difficult to view/analyze (CSV, log file, etc.) and would prefer to load it into a database for its powerful, familiar processing abilities.
I once chose to use MySQL because its built in LOAD DATA command read the malformed CSV I was given and the .import command in SQLite did not.
LOAD DATA
.import
I wrote this module so that I'd never have to make such a choice again. I wanted to be able to use the power of Text::CSV to make sure I could take any CSV I ever got and load it into SQLite easily.
I tried to make this module a base class to be able to handle various formats.
This is more of a list of ideas than features that are planned.
Allow a custom column name transformation sub to be passed in
Use "decamelize" in String::CamelCase by default?
Allow extra columns (like id) to be added and/or generated
id
Option to scan the data to guess appropriate data types for each column
Make a SQLite function so that you could call this from a dbish command line?
dbish
Allow UPDATE statements and specify the key columns (for the WHERE clause)
UPDATE
WHERE
DBIx::TableLoader::CSV
You can find documentation for this module with the perldoc command.
perldoc DBIx::TableLoader
The following websites have more information about this module, and may be of help to you. As always, in addition to those websites please use your favorite search engine to discover more resources.
Search CPAN
The default CPAN search engine, useful to view POD in HTML format.
http://search.cpan.org/dist/DBIx-TableLoader
RT: CPAN's Bug Tracker
The RT ( Request Tracker ) website is the default bug/issue tracking system for CPAN.
http://rt.cpan.org/NoAuth/Bugs.html?Dist=DBIx-TableLoader
CPAN Ratings
The CPAN Ratings is a website that allows community ratings and reviews of Perl modules.
http://cpanratings.perl.org/d/DBIx-TableLoader
CPAN Testers
The CPAN Testers is a network of smokers who run automated tests on uploaded CPAN distributions.
http://www.cpantesters.org/distro/D/DBIx-TableLoader
CPAN Testers Matrix
The CPAN Testers Matrix is a website that provides a visual overview of the test results for a distribution on various Perls/platforms.
http://matrix.cpantesters.org/?dist=DBIx-TableLoader
CPAN Testers Dependencies
The CPAN Testers Dependencies is a website that shows a chart of the test results of all dependencies for a distribution.
http://deps.cpantesters.org/?module=DBIx::TableLoader
Please report any bugs or feature requests by email to bug-dbix-tableloader at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=DBIx-TableLoader. You will be automatically notified of any progress on the request by the system.
bug-dbix-tableloader at rt.cpan.org
https://github.com/rwstauner/DBIx-TableLoader
git clone https://github.com/rwstauner/DBIx-TableLoader.git
Randy Stauner <rwstauner@cpan.org>
This software is copyright (c) 2011 by Randy Stauner.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
To install DBIx::TableLoader, copy and paste the appropriate command in to your terminal.
cpanm
cpanm DBIx::TableLoader
CPAN shell
perl -MCPAN -e shell install DBIx::TableLoader
For more information on module installation, please visit the detailed CPAN module installation guide.