The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

DoubleBlind - Perl extension for data-obfuscation in double-blind experiments.

SYNOPSIS

  use DoubleBlind;

  sub cb($$$) { my ($n, $id, $label) = (shift, shift, shift);
                rename "f$id.txt", "g$label.txt" or die; }
  print process_shuffled \&cb, 55, 1;

DESCRIPTION

The intent is to simplify double-blind experiments in a "friendly" environment, when it is known that the experimentator would not try to consciously break the "coding". (For example, this may work when one does experiments on oneself, or when the generated "label" can be hidden from the subject.) The decoding can be easily done using a calculator, but (with exception of major computational savants) cannot be done unconsciously.

Several items are generated; each one has a "secret" id (which is an integer from user-specified interval), and a "public" label (which is a decimal fraction). A caller-supplied callback function is executed with these data; it is supposed that it would prepare the experimental data, and would mark it with the label.

In the simplest case, the callback would do all the work itself. For example, given files with names f1.txt .. f55.txt, this code would rename them to files with names similar to g2342.461.txt:

  sub cb($$$) {
    my ($n, $id, $label) = (shift, shift, shift);
    rename "f$id.txt", "g$label.txt" or die;
  }
  print process_shuffled \&cb, 55, 1;

(additionally, it would output the decoding instructions). In more complicated cases, the callback might, e.g., output instructions for a third party to label the experimental data.

As an additional convenience, the items are supplied to the callback in a randomized order (the call order is the argument $n to the callback above). (For example, one could apply one of 55 transformations to each of the files above basing on the number $n.)

It should work for up to 1e4 items. (For best result, use 0 for the start index if the number of items is a power of 10; the top item number should not exceed 999999.) Since no attempt of speed optimization is done, large collections of items may require some computational resources.

process_shuffled($callback, $items, $start)

Generates $items items, each with an item ID, and an item label. An item ID is one of $items consecutive integers starting at $start. An item label is a decimal fraction about 2000 with 3 places after the decimal separator. The item ID can be restored as the last N digits before the decimal separator in the square of the label (here the last item has N digits).

For example, the label 1766.433 (its square is 3120285.543489) may correspond to the id 285 if the ids are between 1 and 5000. (For decoding, the calculator should better keep an extra digit after the separation when it emits the square; errors up to 2 units at this position are tolerated.) In absense of calculator, the squaring can be done with Perl as in

  perl -wle "print 1766.433**2"

The callback is a reference to a function taking 3 arguments: the call number (increasing from 1 to $items), the id, and the label.

EXPORT

None by default.

SEE ALSO

The file ex.pl in the distribution contains a complete real-life example of usage to check which audio storing options are suitable for your acoustic environment. Together with instructions inside this script, one can create a CD with double-blind sample of

AUTHOR

Ilya Zakharevich, <ilyaz@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2008 by Ilya Zakharevich

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.