The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Oracle::SQLLoader - object interface to Oracle's sqlldr

DESCRIPTION

Oracle::SQLLoader provides an object wrapper to the most commonly used functionality of Oracle's SQL*Loader bulk loader utility. It tries to dwim as best as possible by using defaults to all of the various sqlldr options.

The module currently supports the loading of a single table from a single file. The file can be either fixed length or delimited. For a delimited file load, just add the names of the destination columns in the order the fields appears in the data file and optionally supply a data type. For a fixed length load, supply the destination column name; the combination of the field starting offset and field length, or the field start and end offsets in the data file; and an optional data type.

Besides letting you skip the Oracle docs, the module provides a lot of useful stats and return codes by parsing the sqlldr output.

SYNOPSIS

  use Oracle::SQLLoader qw/$CHAR $INT $DECIMAL $DATE/;

  ### load a simple comma-delimited file to a single table
  $ldr = new Oracle::SQLLoader(
                                infile => '/tmp/test.dat',
                                terminated_by => ',',
                                username => $user,
                                password => $pass,
                                sid => $sid
                               );

  $ldr->addTable(table_name => 'test_table');
  $ldr->addColumn(column_name => 'first_col');
  $ldr->addColumn(column_name => 'second_col');
  $ldr->addColumn(column_name => 'third_col');
  $ldr->executeLoader() || warn "Problem executing sqlldr: $@\n";

  # stats
  $skipped = $ldr->getNumberSkipped();

  $read = $ldr->getNumberRead();

  $rejects = $ldr->getNumberRejected();

  $discards = $ldr->getNumberDiscarded();

  $loads = $ldr->getNumberLoaded();

  $beginTS = $ldr->getLoadBegin();

  $endTS = $ldr->getLoadEnd();

  $runtimeSecs = $ldr->getElapsedSeconds();

  $secsOnCpu = $ldr->getCpuSeconds();




  #### a fixed length example
  $flldr = new Oracle::SQLLoader(
                                 infile => '/tmp/test.fixed',
                                 username => $user,
                                 password => $pass,
                                 );
  $flldr->addTable(table_name => 'test_table');

  $flldr->addColumn(column_name => 'first_col',
                    field_offset => 0,
                    field_length => 4,
                    column_type => $INT);

  $flldr->addColumn(column_name => 'second_col',
                    field_offset => 4,
                    field_end => 9);

  $flldr->addColumn(column_name => 'third_col',
                    field_offset => 9,
                    field_end => 14,
                    column_type => $CHAR);

  $flldr->addColumn(column_name => 'timestamp',
                    field_offset => 9,
                    field_length => 13,
                    column_type => $DATE,
                    date_format => "YYYYMMDD HH24:MI");

  $flldr->executeLoader() || warn "Problem executing sqlldr: $@\n";

  # stats
  $skipped = $ldr->getNumberSkipped();

  $read = $ldr->getNumberRead();

  $rejects = $ldr->getNumberRejected();

  $discards = $ldr->getNumberDiscarded();

  $loads = $ldr->getNumberLoaded();

  $beginTS = $ldr->getLoadBegin();

  $endTS = $ldr->getLoadEnd();

  $runtimeSecs = $ldr->getElapsedSeconds();

  $secsOnCpu = $ldr->getCpuSeconds();

AUTHOR

Ezra Pagel <ezra@cpan.org>

CONTRIBUTIONS

John Huckelba - fix for single record files scenario

Craig Pearlman <cpearlman@healthmarketscience.com> - added various fixes, most importantly initializing and generating the options clause correctly.

COPYRIGHT

The Oracle::SQLLoader module is Copyright (c) 2006 Ezra Pagel.

The Oracle::SQLLoader module is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available.

Modifications to Oracle::SQLLoader are Copyright (c) 2006 Health Market Science, though this code remains free under the terms of the original license.

SQL*Loader is Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

API

IMPORTABLE VARIABLES

Column Data Types
$CHAR - data to be interpreted as character. empty data is treated as null
$INT - data to be interpreted as integer. empty data is treated as 0
$DECIMAL - data to be interpreted as floating point. empty data is treated as null
$DATE - date fields. these want a valid format, but will default to DD-MON-YY
Table Load Modes
$APPEND - add new rows into the table
$INSERT - adds rows into an empty table, or error out if the table has data
$REPLACE - delete all existing rows from the table and then load
$TRUNCATE - truncate the table (no rollback!) and then load

PUBLIC METHODS

new()

create a new Oracle::SQLLoader object

mandatory arguments
infile - the name and full filesystem path to the input file
username - the oracle username for this load
password - the password for this username
common arguments
control_file - supply an explicit file name to use when generating the control file; useful when cleanup is set to false.
terminated_by - if you're planning on loading a delimited file, just provide the character used as a delimiter and Oracle::SQLLoader will presume that it's a delimited load from here on out. if this option is not supplied, a fixed length file format will be used.
direct - bypass parsing sql and write directly to the tablespace on the filesystem. this is considerably faster than conventional loads, but can only be performed on a local instance. if this argument is not supplied, conventional load will be used.
loadmode - one of the above Load Modes: $APPEND, $INSERT, $REPLACE, $TRUNCATE (default $APPEND)
other optional arguments - you can ususally accept the defaults and pretend that these don't exist.
bindsize - byte size of conventional load buffer; also sets readsize (default 256000)
columnarrayrows - number of rows to buffer for direct loads (default 5000)
errors - the maximum number of errors to accept before failing the load (default 50)
load - the maximum number of records to load (default ALL)
rows - the number of rows to load before committing (default 64)
skip - the number of records to allow to be skipped (default 0)
skip_index_maintenance - for direct loads, don't rebuild indexes (default false)
skip_unusable_indexes - skips rebuilding unusable indexes (default false)
streamsize - (direct) the byte size of the load stream (default 256000)
badfile -
discardfile -
eol -
offset_from - do your offsets start at position 0 or 1
enclosed_by - are there some sort of enclosing characters, double-quotes perhaps?
cleanup - want to leave the load files on disk, maybe for testing/auditing?

addTable()

identify a table to be loaded. multiple adds with the same table name clobbers any old definition.

mandatory arguments
table name - the name of the table to load
optional arguments
when_clauses
continue_clauses
terminated_clauses
enclosed_clauses
nullcols

addColumn()

add a column to be loaded

mandatory arguments for all load types
column_name - the name of the column to be loaded
mandatory arguments for fixed length load types
field_offset - starting position of the data in the input file
and one of:
field_end - the ending position of the data in the input file
field_length - the length of the field, measured from field_offset
optional arguments for all load types
table_name - the name of the table that this column belongs to. if no table name is specified, default is the last known table name. if no previous table name exists, croak.
column_type - $CHAR, $INT, $DECIMAL, or $DATE; defaults to $CHAR
date_format - the TO_DATE format for a $DATE column; defaults to "DD-MON-YY"
column_length - on occassion, it's useful to specify the length of the field; for some reason, this is required when loading large strings (e.g. CHAR(3000))

executeLoader()

generate a control file and execute an sqlldr job. this is a blocking call. if you don't care about load statistics, you can always fork it off. returns 1 if sqlldr ran successfully, 0 if there were errors or warnings

checkLogfile()

parse an sqlldr logfile and be store results in object status

optional arguments
$logfile - the file to parse; defaults to the object's current logfile

STATUS METHODS

getNumberSkipped()

returns the number of records skipped , or undef if no stats are known

getNumberRead()

returns the number of read from all input files, or undef if no stats are known

getNumberRejected()

returns the number of records rejected, or undef if no stats are known

getNumberDiscarded()

returns the number of records discarded, or undef if no stats are known

getNumberLoaded()

returns the number of records successfully loaded, or undef if no stats are known

getLastRejectMessage()

returns the last known rejection message, if any

getLoadBegin()

the time that the job began represented as epoch timestamp

getLoadEnd()

the time that the job finished represented as epoch timestamp

getElapsedSeconds()

returns the number if seconds elapsed during load

getCpuSeconds()

returns the number if seconds on cpu during load

getErrors()

returns a listref of any SQL*Loader-specific error codes and messages that were reported in the load logs, e.g the instance is down (SQL*Loader-128), the table does not exist (SQL*Loader-941), or the username/password was wrong (SQL*Loader-101). see the 'SQL*Loader Messages' section of your Oracle docs

getLastError()

returns the last known SQL*Loader error code and message, or an empty string

CONTENT GENERATION METHODS

generateControlfile()

based on the current configuration options, generate a control file

UTILITY METHODS

findProgram()

searches ORACLE_HOME and PATH environment variables for an executable program. returns the full path and file name of the first match, or undef if not found. can be invoked as a class or instance method.

Oracle::SQLLoader->findProgram('sqlldr') or $ldr->findProgram('sqlldr.exe')

mandatory arguments
$executable - the name of the program to search for

checkEnvironment()

ensure that ORACLE_HOME is set and that the sqlldr binary is present and executable. can be invoked as a class or instance method.

Oracle::SQLLoader->findProgram('sqlldr') or $ldr->findProgram('sqlldr.exe')

PRIVATE METHODS

_generateSessionClause()

generate the session-wide information for a control file

_generateTablesClause()

generate table and column information for a control file

_generateDataClause()

generate any input data for a control file

initDefaults()

the loader defaults are almost directly from the sqlldr usage dumps