The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Data::Sync - A simple metadirectory/datapump module (advanced usage)

DESCRIPTION

The basic functionality of Data::Sync is described in the main POD documentation. This documentation details more complex or rarely used functionality. You may have a requirement to customise Data::Sync by subclassing it, or you may wish to have more granular control over data flow for example. Or you may just want to use individual functions of the module as a sort of toolkit. If any of these are the case, you're reading the right document

EXTRA METHODS

get

 my $AoH = $sync->get();

This function triggers the read function defined in the Data::Sync constructor, and returns the resulting data as an array of hashes (see perlref if you are unfamilar with how to access this kind of data structure).

put

 my $result = $sync->put($AoH);

This function writes the content of $AoH (an array of hashes) to the target defined in the Data::Sync constructor, and returns the result (1 for success).

hashrecord

 my $hash = $sync->hashrecord($hashref,$arrayref);

This function is used internally by Data::Sync to detect changes in records (for minimising writes), but is also accessible for other uses. It takes a hashref of the db record, and an arrayref of attributes to include in the hash. It returns a hex representation of the MD5 hash of the concatenated record values. You can also call it with

 my $hash = Data::Sync->hashrecord($hashref,$arrayref);
 

(i.e. it does not depend on values set in the object constructor).

remap

 my $remapped = $sync->remap($AoH)
 

Remaps the field names, as defined in $sync->mappings. Takes array of hashes, returns array of hashes with the hash keys renamed.

runtransform

 my $transformed = $sync->runtransform($AoH);
 

Runs the transformations defined in $sync->transform. Takes array of hashes, returns array of hashes. You can use this function in conjunction with mappings as a utility function to perform recursive transformations of data:

 my $sync = Data::Sync->new();
 $sync->transform(name=>lowercase);
 my $transformed = $sync->runtransform($AoH);
 

will recurse through a data structure (an array of arrays, array of hashes, array of hashes of arrays of hashes etc) of arbitrary depth, performing the transform on all hash values where the key matches (including every value of hash values containing anonymous arrays).

makebuiltattributes

 my $transformed = $sync->makebuiltattributes($AoH);
 

Create attributes in the data set as defined in buildattributes().

scanhashtable

 my $hashed = $self->scanhashtable($AoH);
 

Check the dataset against the stored hashtable, to filter out unchanged records. Returns a dataset with unchanged records removed. Requires that hashattributes has been set in the target definition.

hashrecord

 my $hash = $self->hashrecord(\%record,[ATTRIB,ATTRIB]);
 

returns a hexdigest MD5 hash of the attributes in the record.

getdeletes

 my @deletes = $self->getdeletes();
 

returns a list of TARGETINDEX=>value entries for all entries that have been deleted. This is a function of hashing, so will only detect deletes if the following steps are followed (the run method does this):

 1) read from source
 2) transform etc
 3) hash the entries
 

Note that you don't need to write for deletes to be detected - the hashing of entries is done by scanhashtable before the write to target function is called. This means that using get() and scanhashtable() you can set the deltas to the current state without performing a write.

batch update mode

 $sync->source($handle,{        batchsize=>x,
                                controls=>[$ldapcontrol]        } );

Batch update mode is only used in the standard (i.e. $sync->run based) usage of Data::Sync. Its primarily intended to handle asynchronous, persistent/paged LDAP searches, or SQL database queries where you want to see the updates as quickly as possible (without necessarily waiting for them all to complete). For details on how to construct persistent & asynchronous LDAP controls, see Net::LDAP::Control. This will read a batch from the handle, perform the operation, read the next batch from the handle, and so on. Note that with a DBI handle, this will still be working against an entire record set matching your criteria, so the memory advantages are limited.

subclassing Data::Sync

For whatever reason, you may want to subclass Data::Sync - perhaps to implement RDBMS specific read or write functions, or to add new functionality. Data::Sync is (or should be) subclass safe. The main functions you are likely to want to overload are:

 readdbi
 writedbi
 readldap
 writeldap

each will be passed the following parameters in sequential order:

 self
 db/ldap handle
 anonymous hash of parameters (see 'source' and 'target' methods.
 

and called by get/put or run().

You might also want to overload ::run with your own variant. ::run calls the read, remap, transform and writes methods sequentially (If you look at ::run, it also calls sourceToAoH - this is a vital call to convert a DBI or LDAP handle into an array of hashes record set, before remapping and transforming the records).

You may also wish to use a database other than SQLite to hash records, in which case you need to overload getdeletes() and scanhashtable(). See the code for more details.

EXAMPLE USES

 my $source1 = Data::Sync->new();
 $source1->source($dbhandle,"select * from sourcetable1");
 $source1->mappings(NAME=>NEWNAME);
 $source1->transform(NEWNAME=>lowercase);
 
 my $source2->new();
 $source2->source($dbhandle2,"select * from sourcetable2");
 $source2->mappings(FULLNAME=>NEWNAME);
 $source2->transform(NEWNAME=>lowercase);
 
 my $target=Data::Sync->new();
 $target->target($dbhandle3,{index=>NEWNAME});
 
 my $set1 = $source1->get();
 $set1 = $source1->remap($set1);
 $set1 = $source1->runtransform($set1);
 
 my $set2 = $source2->get();
 $set2 = $source2->remap($set2);
 $set2 = $source2->runtransform($set2);
 
 my @recordset = (@$set1,@$set2);
 $target->put(\@recordset);
 

Would read the two defined sources, remap & transform the contents, join the two datasets together, and write them both to the target.