The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

XML::Diff -- XML DOM-Tree based Diff & Patch Module

SYNOPSIS

  my $diff = XML::Diff->new();

  # to generate a diffgram of two XML files, use compare.
  # $old and $new can be filepaths, XML as a string,
  # XML::LibXML::Document or XML::LibXML::Element objects.
  # The diffgram is a XML::LibXML::Document by default.
  my $diffgram = $diff->compare(
                                -old => $old_xml,
                                -new => $new_xml,
                               );

  # To patch an XML document, an patch. $old and $diffgram
  # follow the same formatting rules as compare.
  # The resulting XML is a XML::LibXML::Document by default.
  my $patched = $diff->patch(
                             -old      => $old,
                             -diffgram => $diffgram,
                            );

DESCRIPTION

This module provides methods for generating and applying an XML diffgram of two related XML files. The basis of the algorithm is tree-wise comparison using the DOM model as provided by XML::LibXML.

The Diffgram is well-formed XML in the XVCS namespance and supports update, insert, delete and move operations. It is meant to be human and machine readable. It uses XPath expressions for locating the nodes to operate on. See the below DIFFGRAM section for the exact syntax.

The motivation and alogrithm used by this module is discussed in MOTIVATION below.

PUBLIC METHODS

new (Constructor)

The Constructor takes no arguments. It merely creates the object for using the compare and patch methods on.

compare

Compares two XML DOM trees and returns a diffgram for converting one into the other. The default output method is a XML::LibXML::Document object. However there are number of switches to alter this behavior.

-old

The old document to compare. Can be XML in a string, path to an XML document, a XML::LibXML::Document or XML::LibXML::Element object

-new

The new document to compare. Can be XML in a string, path to an XML document, a XML::LibXML::Document or XML::LibXML::Element object

-asString

If provided, the diffgram is returned via the toString(1) method of XML::LibXML

-asFile

Must provide the filepath to write the diffgram to.

patch

Applies a diffgram to an XML document to generate a new XML document. The default output method is a XML::LibXML::Document object. However there are number of switches to alter this behavior.

-old

The old document to compare. Can be XML in a string, path to an XML document, a XML::LibXML::Document or XML::LibXML::Element object

-diffgram

The diffgram to apply. Can be XML in a string, path to an XML document, a XML::LibXML::Document or XML::LibXML::Element object

-asString

If provided, the new document is returned via the toString(1) method of XML::LibXML

-asFile

Must provide the filepath to write the new document to.

DIFFGRAM

The diffgram is an XML document in the xvcs namespace. It's root is always e<xvcs:diffgram xmlns:xvcs="http://www.xvcs.org/">. Below diff operations are attached in order of application. Order is significant, since the way that nodes are idenitified in the default version of the diffgram is by an XPath expression, i.e. the diffgram may change the XML document in such a way that XPath expressions are either not yet valid or will not be anymore at a later point the diffgram (see KNOWN PROBLEMS for a discussion of this limitation).

The supported diffgram operations are:

xcvs:update

Update operations covers a number of sub-operations, i.e. it can be used for Text node changes, attribute add, delete and modification. An example of a Text Node change is:

  <xvcs:update id="18" first-child-of="/root/block[2]/list/item[2]">
    <xvcs:old-value>Old Value</xvcs:old-value>
    <xvcs:new-value>New Value</xvcs:new-value>
  </xvcs:update>

Attribute updates are:

  <xvcs:update id="31" first-child-of="/root/block[5]">
    <xvcs:attr-insert name="some_attribute" value="new value"/>
  </xvcs:update>
  <xvcs:update id="32" first-child-of="/root/block[6]">
    <xvcs:attr-insert name="some_attribute2" value="old value"/>
  </xvcs:update>
  <xvcs:update id="33" first-child-of="/root/block[6]">
    <xvcs:attr-update name="some_attribute3" 
      old-value="old value" new-value="new value/>
  </xvcs:update>

xcvs:delete

  <xvcs:delete id="29" follows="/root/block[3]">
    <block>
      <node>value</node>
    </block>
  </xvcs:delete>

xcvs:move

  <xvcs:move id="11" follows="/root/block[1]">
    <xvcs:source first-child-of="/root"/>
  </xvcs:move>

xcvs:insert

  <xvcs:insert id="34" follows="/root/block[1]">
    <block>
      <node>value</node>
    </block>
  </xvcs:insert>

All operations share the same attributes to identify the operation

id

The xvcs:id of the node affected (currently serves only internal uses)

follows

The XPath to the prior sibling of the node affected. We use relative identification since insert and move destination do not affect an existing node location. The rest of the operations follow this methodology for consistency and to allow simple reversing of an operation

first-child-of

If the XPath for the node does not have a prior sibling, we use the XPath to the parent and note that our operation affects the first child of that parent

text

Since XPath does not have an expression for locating a text node, Nodes following Text nodes are identified by the XPath to the prior sibling that is an Element and the text attribute to tell it to skip the next text node before starting the operation

KNOWN PROBLEMS

  • Does not handle any Node Types Other than Element, Attribute and Text

  • Diffgram operations are not guaranteed to be atomic

  • Delete Operations on Nodes between two Text nodes are not reversable

MOTIVATIONS

The Algorithm used in this Module is loosely based on the one described by Gregory Cobena in his Doctoral Dissertation on XyDiff. The decision to create a new implementation of this Algorithm rather than just create an XS interface to the existing XyDiff algorithm was based on wanting a perl implementation with less external dependencies and greater flexibility to add divergent features (such as using XPath for node identitication rather than XIDs).

PRIVATE METHODS

This section is mostly for reference if you are going through the code, it serves no purpose if you are just wanting to use the exposed interface

_getDoc

_buildTree

_weightmatch

_propagateMatch

_matchParents

_markChanges

_registerChange

_processChange

_local_move

_setDiff

_attachInstructions

_applyAction

_applyInsert

_insertRegister

_applyUpdate

_applyDelete

_applyMove

_applyMoveUnbind

_applyMoveBind

_debug

AUTHOR

Arne Claassen <cpan@unixmechanix.com>

VERSION

0.04

COPYRIGHT

2004 Arne F. Claassen, UnixMechanix.com, All rights reserved.