The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Algorithm::Diff::Any - Perl module to find differences between files

VERSION

Version 1.001 ($Id: Any.pm 10595 2009-12-23 00:29:52Z FREQUENCY@cpan.org $)

DESCRIPTION

This is a simple module to select the best available implementation of the standard diff algorithm, which works by effectively trying to solve the Longest Common Subsequence (LCS) problem. This algorithm is described in: A Fast Algorithm for Computing Longest Common Subsequences, CACM, vol.20, no.5, pp.350-353, May 1977.

However, it is algorithmically rather complicated to solve the LCS problem; for arbitrary sequences, it is an NP-hard problem. Simply comparing two strings together of lengths n and m is O(n x m). Consequently, this means the algorithm necessarily has some tight loops, which, for a dynamic language like Perl, can be slow.

In order to speed up processing, a fast (C/XS-based) implementation of the algorithm's core loop was implemented. It can confer a noticable performance advantage (benchmarks show a 54x speedup for the compact_diff routine).

SYNOPSIS

  use Algorithm::Diff::Any;

  my $diff = Algorithm::Diff::Any->new(\@seq1, \@seq2);

For complete usage details, see the Object-Oriented interface description for the Algorithm::Diff module.

PURPOSE

The intent of this module is to provide single simple interface to the two (presumably) compatible implementations of this module, namely, Algorithm::Diff and Algorithm::Diff::XS.

If, for some reason, you need to determine what version of the module is actually being included by Algorithm::Diff::Any, then:

  print 'Backend type: ', $Algorithm::Diff::Any::DRIVER, "\n";

In order to force use of one or the other, simply load the appropriate module:

  use Algorithm::Diff::XS;
  my $diff = Algorithm::Diff::XS->new();
  # or
  use Algorithm::Diff;
  my $diff = Algorithm::Diff->new();

COMPATIBILITY

This module was tested under Perl 5.10.1, using Debian Linux. However, because it's Pure Perl and doesn't do anything too obscure, it should be compatible with any version of Perl that supports its prerequisite modules.

If you encounter any problems on a different version or architecture, please contact the maintainer.

EXPORTABLE FUNCTIONS

The following functions are available for import into your namespace:

  • prepare

  • LCS

  • LCSidx

  • LCS_length

  • diff

  • sdiff

  • compact_diff

  • traverse_sequences

  • traverse_balanced

For full documentation, see the relevant functional descriptions in the Pure Perl implementation, Algorithm::Diff.

METHODS

new

  Algorithm::Diff::Any->new( \@seq1, \@seq2, \%opts );

Creates a Algorithm::Diff::Any object, based upon either the optimized C/XS version of the algorithm, Algorithm::Diff::XS, or falls back to the Pure Perl implementation, Algorithm::Diff.

Example code:

  my $diff = Algorithm::Diff::Any->new( \@seq1, \@seq2 );
  # or with options
  my $diff = Algorithm::Diff::Any->new( \@seq1, \@seq2, \%opts );

This method will return an appropriate Algorithm::Diff::Any object or throw an exception on error.

Next

  $diff->Next( $count )

See Algorithm::Diff for method documentation.

Prev

  $diff->Prev( $count )

See Algorithm::Diff for method documentation.

Reset

  $diff->Reset( $pos )

See Algorithm::Diff for method documentation.

Copy

  $diff->Copy( $pos, $newBase )

See Algorithm::Diff for method documentation.

Base

  $diff->Base( $newBase )

See Algorithm::Diff for method documentation.

Diff

  $diff->Diff( )

See Algorithm::Diff for method documentation.

Same

See Algorithm::Diff for method documentation.

Code example:

  $diff->Same( )

Items

  $diff->Items( $seqNum )

See Algorithm::Diff for method documentation.

Range

  $diff->Range( $seqNum, $base )

See Algorithm::Diff for method documentation.

Min

  $diff->Min( $seqNum, $base )

See Algorithm::Diff for method documentation.

Max

  $diff->Max( $seqNum, $base )

See Algorithm::Diff for method documentation.

Get

  $diff->Get( @names )

See Algorithm::Diff for method documentation.

AUTHOR

Jonathan Yu <jawnsy@cpan.org>

CONTRIBUTORS

Your name here ;-)

ACKNOWLEDGEMENTS

  • Many thanks go to the primary authors and maintainers of the Pure Perl implementation of this algorithm, notably:

    • Mark-Jason Dominus <mjd-perl-diff@plover.com>

    • Ned Konz <perl@bike-nomad.com>

    • Tye McQueen <tyemq@cpan.org>

  • Thanks to Audrey Tang <cpan@audreyt.org>, author of Algorithm::Diff::XS, for recognizing the value of Joe Schaefer's <apreq-dev@httpd.apache.org> work on Algorithm::LCS

  • Neither the Pure Perl nor C/XS-based implementations of this module would have been possible without the work of James W. Hunt (Stanford University) and Thomas G. Szymanski (Princeton University), authors of the often-cited paper for computing longest common subsequences.

    In their abstract, they claim that a running time of O(n log n) can be expected, with a worst-case time of O(n^2 log n) for two subsequences of length n.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Algorithm::Diff::Any

You can also look for information at:

REPOSITORY

You can access the most recent development version of this module at:

http://svn.ali.as/cpan/trunk/Algorithm-Diff-Any

If you are a CPAN developer and would like to make modifications to the code base, please contact Adam Kennedy <adamk@cpan.org>, the repository administrator. I only ask that you contact me first to discuss the changes you wish to make to the distribution.

FEEDBACK

Please send relevant comments, rotten tomatoes and suggestions directly to the maintainer noted above.

If you have a bug report or feature request, please file them on the CPAN Request Tracker at http://rt.cpan.org. If you are able to submit your bug report in the form of failing unit tests, you are strongly encouraged to do so.

SEE ALSO

Algorithm::Diff, the classic reference implementation for finding the differences between two chunks of text in Perl. It is based on the algorithm described in A Fast Algorithm for Computing Longest Common Subsequences, CACM, vol.20, no.5, pp.350-353, May 1977.

Algorithm::Diff::XS, the C/XS optimized version of Algorithm::Diff, which will be used automatically if available.

CAVEATS

KNOWN BUGS

There are no known bugs as of this release.

LIMITATIONS

  • It is not currently known whether Algorithm::Diff (Pure Perl version) and Algorithm::Diff::XS (C/XS implementation) produce the same output. The algorithms may not be equivalent (source code-wise) so they may produce different output under some as-yet-undiscovered conditions.

  • Any potential performance gains will be limited by those features implemented by Algorithm::Diff::XS. As of time of writing, this is limited to the cdiff subroutine.

QUALITY ASSURANCE METRICS

TEST COVERAGE

  -------------------------- ------ ------ ------ ------ ------ ------
  File                        stmt   bran   cond   sub    pod   total
  -------------------------- ------ ------ ------ ------ ------ ------
  lib/Algorithm/Diff/Any.pm  100.0  100.0  100.0  100.0  100.0  100.0
  Total                      100.0  100.0  100.0  100.0  100.0  100.0

LICENSE

Copyright (C) 2009 by Jonathan Yu <jawnsy@cpan.org>

This package is distributed under the same terms as Perl itself. Please see the LICENSE file included in this distribution for full details of these terms.

DISCLAIMER OF WARRANTY

This software is provided by the copyright holders and contributors "AS IS" and ANY EXPRESS OR IMPLIED WARRANTIES, including, but not limited to, the IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.

In no event shall the copyright owner or contributors be liable for any direct, indirect, incidental, special, exemplary or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.