The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Crypt::IDA::ShareFile - Archive file format for Crypt::IDA module

SYNOPSIS

  use Crypt::IDA::ShareFile ":default";
 
  @list  = sf_split( ... );
  $bytes = sf_combine ( ... );

DESCRIPTION

This module implements a file format for creating, storing and distributing shares created with Crypt::IDA. Created files contain share data and (by default) the corresponding transform matrix row used to split the input file. This means that share files are (again, by default) stand-alone and may recombined later without needing any other stored key or the involvement of the original issuer.

In addition to creating a number of shares, the module can also handle breaking the input file into several chunks before processing, in a way similar to multi-volume PKZIP, ARJ or RAR archives. Each of the chunks may be split into shares using a different transform matrix. Individual groups of chunks may be re-assembled independently, as they are collected, and the quorum for each is satisfied.

EXPORT

No methods are exported by default. All methods may be called by prefixing the method names with the module name, eg:

 $foo=Crypt::IDA::ShareFile::sf_split(...)

Alternatively, routines can be exported by adding ":default" to the "use" line, in which case the routine names do not need to be prefixed with the module name, ie:

  use Crypt::IDA::ShareFile ":default";
 
  $foo=Crypt::IDA::ShareFile::sf_split(...)
  # ...

Some extra ancillary routines can also be exported with the ":extras" (just the extras) or ":all" (":extras" plus ":default") parameters to the use line. See the section "ANCILLARY ROUTINES" for details.

SPLIT OPERATION

The template for a call to sf_split, showing all possible inputs and default values, is as follows:

 @list=sf_split(
         shares => undef,
         quorum => undef,
         width => 1,
         filename => undef,
         # supply a key, a matrix or neither
         key => undef,
         matrix => undef,
         # misc options
         version => 1,          # header version
         rand => "/dev/urandom",
         bufsize => 4096,
         save_transform => 1,
         # chunking methods; pick one at most
         n_chunks => undef,
         in_chunk_size => undef,
         out_chunk_size => undef,
         out_file_size => undef,
         # allow creation of a subset of shares, chunks
         sharelist => undef,    # [ $row1, $row2, ... ]
         chunklist => undef,    # [ $chunk1, $chunk2, ... ]
         # specify pattern to use for share filenames
         filespec => undef,     # default value set later on
   );

The minimal set of inputs is:

 @list=sf_split(
         shares => $number_of_shares,
         quorum => $quorum_value,
         filename => "filename",
   );

The function returns a list of [$key,$mat,$bytes_read,@output_files] listrefs corresponding to each chunk that was created, or undef in the case of an error.

The n_chunks, in_chunk_size, out_chunk_size and out_file_size options allow control over how (or if) the input file is broken into chunks. At most one of these options may be specified. The n_chunks option divides the input into the specified number of chunks, which will be of (more-or-less) equal size.

The filespec option allows control over naming of output files. By default this is set to '%f-%c-%s.sf' when a file is being split into several chunks, or '%f-%s.sf' where no chunking is performed. Before creating the output files, the '%f', '%c' and '%s' patterns are replaced by:

%f: input file name
%c: chunk number
%s: share number

If an error is encountered during the creation of one set of shares in a multi-chunk job, then the routine returns immediately without attempting to split any other remaining chunks.

COMBINE OPERATION

The template for a call to sf_combine, showing all possible inputs and default values, is as follows:

 $bytes_read = sf_split (
     infiles => undef,          # [ $file1, $file2, ... ]
     outfile => undef,          # "filename"
     # If specified, the following must agree with the values stored
     # in the sharefiles. There's normally no need to set these.
     quorum => undef,
     width => undef,
     # optional matrix, key parameters
     key => undef,
     matrix => undef,
     shares => undef,           # required if key supplied
     sharelist => undef,        # required if key supplied
     # misc options
     bufsize => 4096,
    );

The minimal set of inputs is:

 $bytes_written = sf_combine (
     infiles => [ $file1, $file2, ... ],
     outfile => $output_filename
 );

The return value is the number of bytes written to the output file or null in the case of some error.

The current version of the module only supports combining a single chunk with each call to ida_combine. Apart from being used in the call to open the input file, the routine does not examine the input filenames at all since all information necessary to combine the file is expected to be contained within the files themselves (along with any key/matrix parameters passed in, in the case where this information is not stored in the file itself).

Chunks may be combined in any order. When the final chunk is processed, if any any padding bytes were added to it during the sf_split routine, these will be removed by truncating the output file.

ANCILLARY ROUTINES

The extra routines are exported by using the ":extras" or ":all" parameter with the initial "use" module line. The extra routines are as follows:

 $filename = sf_sprintf_filename($format,$infile,$chunk,$share);

This routine creates share file names from the given parameters. It is used internally by sf_split.

 @chunk_info=sf_calculate_chunk_sizes(
      quorum => undef,
      width => undef,
      filename => undef,
      # misc options
      version => 1,             # header version
      save_transform => 1,      # whether to store transform in header
      # chunking method: pick at most one
      n_chunks => undef,
      in_chunk_size => undef,
      out_chunk_size => undef,
      out_file_size => undef,
 );

This returns a list of hashrefs containing information about chunk sizes, ranges, etc., with one element for each chunk which would be created with the given parameters. All input values match those which would be passed to sf_split except for the save_transform value, which specifies whether the transform matrix row for each share should be stored within the file header. Each hash in the returned list has the following keys:

  • chunk_start first byte of chunk

  • chunk_next first byte of next chunk

  • chunk_size chunk_next - chunk_start

  • file_size share file size, including header

  • opt_final is the last chunk in the file?

  • padding number of padding bytes in (final) chunk

This routine is used internally by sf_split to calculate chunk sizes. It is available for calling routines since it may be useful to know in advance how large output files will be before any shares are created, such as for cases where there is limited space (eg, network share or CD image) for creation of those output shares.

KEY MANAGEMENT

Provided the default settings are used, created sharefiles will have all the information necessary to reconstruct the file once sufficient shares have been collected. For systems where an alternative scheme is required, see the discussion in the Crypt::IDA man page.

TECHNICAL DETAILS

File Format

Each share file consists of a header and some share data. For the current version of the file format (version 1), the header format is as follows:

  Bytes   Name           Value
  2       magic          marker for "Share File" format; "SF" = {5346}
  1       version        file format version = 1
  1       options        options bits (see below)
  1-2     k,quorum       quorum k-value
  1-2     s,security     security level (ie, field width, in bytes)
  var     chunk_start    absolute offset of chunk in file
  var     chunk_next     absolute offset of next chunk in file
  var     transform      transform matrix row (optional)

All values stored in the header file (and the share data) are stored in network (big-endian) byte order.

The options bits are as follows:

  Bit     name           Settings
  0       opt_large_k    Large (2-byte) k value?
  1       opt_large_w    Large (2-byte) w value?
  2       opt_final      Final chunk in file? (1=full file/final chunk)
  3       opt_transform  Is transform data included?

All file offsets are stored in a variable-width format. They are stored as the concatenation of two values:

  • the number of bytes required to store the offset, and

  • the actual file offset.

So, for example, the offset "0" would be represented as the single byte "0", while the offset 0x4321 would be represented as the hex bytes "02", "43", "21".

Note that the chunk_next field is 1 greater than the actual offset of the chunk end. In other words, each chunk ranges from the byte starting at chunk_start up to, but not including the byte at chunk_next. That's why it's called chunk_next rather than chunk_end.

LIMITATIONS

The current implementation is limited to handling input files less than 4Gb in size. This is merely a limitation of the current header handling code, and this restriction may by removed in a later version.

Currently, the only chunking options available are for no chunking or for chunking a file into a given number of chunks (with n_chunks option).

FUTURE VERSIONS

It is possible that the following changes/additions will be made in future versions:

  • implement a routine to scan given list of input files and group them into batches that can be passed to sf_combine (as well as weeding out sub-quorum batches and broken, overlapping or non-compliant files);

  • implement a file format that can keep track of created sharefiles along with parameters used to create them;

  • implement encryption of input data stream along with dispersed storage of decryption key in share files;

  • implement a cryptographic accumulator to eliminate the possibility of a cheater presenting an invalid share at the combine stage;

  • implement regular checksum/hash to detect damage to share data;

  • implement storing a row number with shares in the case where the transform data for that share is not stored in the sharefile header;

  • implement a scatter/gather function to disperse shares over the network and download (some of) them again to reconstitute the file (probably in a new module);

  • implement a network-based peer protocol which allows peers with the same file to co-ordinate key generation so that they can generate compatible shares and also share the burden of ShareFile creation and distribution (long-term goal)

  • I'm open to suggestions on any of these features or any other feature that anybody might want ...

SEE ALSO

See the documentation for Crypt::IDA for more details of the underlying algorithm for creating and combining shares.

This distribution includes two command-line scripts called rabin-split.pl and rabin-combine.pl which provide simple wrappers to access all functionality of the module.

AUTHOR

Declan Malone, <idablack@sourceforge.net>

COPYRIGHT AND LICENSE

Copyright (C) 2009 by Declan Malone

This package is free software; you can redistribute it and/or modify it under the terms of version 2 (or, at your discretion, any later version) of the "GNU General Public License" ("GPL").

Please refer to the file "GNU_GPL.txt" in this distribution for details.

DISCLAIMER

This package is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.