The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Pg::Explain::StringAnonymizer - Class to anonymize sets of strings

VERSION

Version 0.66

SYNOPSIS

This module provides a way to turn defined set of strings into anonymized version of it, that has 4 properties:

  • the same original string should give the same output string (within the same input set)

  • strings shouldn't be very long

  • it shouldn't be possible to reverse the operation

  • generated strings should be easy to read, and easy to distinguish between themselves.

Points first and third can be done easily with some hashing function (md5, sha), but generated hashes violate fourth point, and sometimes also second.

Example of usage:

    my $anonymizer = Pg::Explain::StringAnonymizer->new();
    $anonymizer->add( 'a', 'b', 'c');
    $anonymizer->add( 'depesz' );
    $anonymizer->add( [ "any strings, "are possible" ] );
    $anonymizer->finalize();

    print $anonymizer->anonymized( 'a' ), "\n";

    my $full_dictionary = $anonymizer->anonymization_dictionary();

METHODS

new

Object constructor, doesn't take any arguments.

add

Adds new string(s) to anonymization list.

Strings can be given either as list of ArrayRef.

It is important to note, that one cannot add() more elements to anonymized set after finalization (call to finalize() method).

If such call will be made (add() after finalize()) it will raise exception.

finalize

Finalizes string set creation, and creates anonymized versions.

It has to be called after some number of add() calls, so that it will have something to work on.

After running finalize() one cannot add() more string.

Also, before finalize() you cannot run anonymized() or anonymization_dictionary() methods.

anonymized

Returns anonymized version of given string, or undef if the string wasn't previously added to anonymization set.

If it will be called before finalize() it will raise exception.

anonymization_dictionary

Returns hash reference containing all input strings and their anonymized versions, like:

    {
        'original1' => 'anon1',
        'original2' => 'anon2',
        ...
        'originalN' => 'anonN',
    }

If it will be called before finalize() it will raise exception.

INTERNAL METHODS

_hash

Converts given string into array of 32 integers in range 0..31.

This is done by taking sha1 checksum of string, splitting it into 32 5-bit long "segments", and transposing each segment into integer.

_word

Returns n-th word from number-to-word translation dictionary.

_make_prefixes

Scan given keys, and changes their values (in ->{'strings'} hash) to shortest unique prefix.

_stringify

Converts arrays of ints (prefixes for hashed words) into strings

AUTHOR

hubert depesz lubaczewski, <depesz at depesz.com>

BUGS

Please report any bugs or feature requests to depesz at depesz.com.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Pg::Explain::StringAnonymizer

COPYRIGHT & LICENSE

Copyright 2011 hubert depesz lubaczewski, all rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.