The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

CAM::XML - Encapsulation of a simple XML data structure

LICENSE

Copyright 2006 Clotho Advanced Media, Inc., <cpan@clotho.com>

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SYNOPSIS

  my $pollTag = CAM::XML->new('poll');
  
  foreach my $q (@questions) {
    my $questionTag = CAM::XML->new('question');
    
    $questionTag->add(-text => $q->{text});
    my $choicesTag = CAM::XML->new('choices');
    
    foreach my $c (@{$q->{choices}}) {
      my $choiceTag = CAM::XML->new('choice');
      $choiceTag->setAttributes('value', $c->{value});
      $choiceTag->add(-text => $c->{text});
      $choicesTag->add($choiceTag);
    }
    $questionTag->add($choicesTag);
    $pollTag->add($questionTag);
  }
  print CAM::XML->header();
  print $pollTag->toString();

DESCRIPTION

This module reads and writes XML into a simple object model. It is optimized for ease of creating code that interacts with XML.

This module is not as powerful or as standards-compliant as say XML::LibXML, XML::SAX, XML::DOM, etc, but it's darn easy to use. I recommend it to people who want to just read/write a quick but valid XML file and don't want to bother with the bigger modules.

In our experience, this module is actually easier to use than XML::Simple because the latter makes some assumptions about XML structure that prevents it from handling all XML files well. YMMV.

However, one exception to the simplicity claimed above is our implementation of a subset of XPath. That's not very simple. Sorry.

CLASS METHODS

$pkg->parse($xmlstring)
$pkg->parse(-string => $xmlstring)
$pkg->parse(-filename => $xmlfilename)
$pkg->parse(-filehandle => $xmlfilehandle)

Parse an incoming stream of XML into a CAM::XML hierarchy. This method just hands the first argument off to XML::Parser, so it can accept any style of argument that XML::Parser can. Note that XML::Parser says the filehandle style should pass an IO::Handle object. This can be called as a class method or an instance method.

Additional meaningful flags:

  -cleanwhitespace => 1

Traverse the document and remove non-significant whitespace, as per removeWhitespace().

  -xmlopts => HASHREF

Any options in this hash are passed directly to XML::Parser.

NOTE: this method does NOT work well on subclasses. I tried, but failed to fix it up. The problems is that CAM::XML::XMLTree has to be able to instantiate one of this class, but there's no really good way to communicate with it yet.

$pkg->new($tagname)
$pkg->new($tagname, attr1 => $value1, attr2 => $value2, ...)

Create a new XML tag. Optionally, you can set tag attributes at the same time.

$pkg->header()
$self->header()

Return a string containing the following message, suffixed by a newline:

  <?xml version="1.0" encoding="UTF-8" standalone="no" ?>

INSTANCE METHODS

$self->getName()

Returns the name of the node.

$self->setAttributes(attr1 => $value1, attr2 => $value2, ...)

Set the value of one or more XML attributes. If any keys are duplicated, only the last one set is recorded.

$self->deleteAttribute($key)

Remove the specified attribute if it exists.

$self->getAttributeNames()

Returns a list of the names of all the attributes of this node. The names are returned in arbitrary order.

$self->getAttributes()

Returns a hash of all attributes.

$self->getAttribute($key)

Returns the value of the named attribute, or undef if it does not exist.

$self->getChildren()

Returns an array of XML nodes and text objects contained by this node.

$self->getChild($index)

Returns a child of this node. The argument is a zero-based index. Returns undef if the index is not valid.

$self->getChildNodes()

Returns an array of XML nodes contained by this node (that is, unlike getChildren(), text nodes are ignored).

$self->getChildNode($index)

Returns a CAM::XML child of this node (that is, unlike getChild(), text nodes are ignored. The argument is a zero-based index. Returns undef if the index is not valid.

$self->setChildren($node1, $node2, ...)

Removes all the children from this node and replaces them with the supplied values. All of the values MUST be CAM::XML or CAM::XML::Text objects, or this method will abort and return false before any changes are made.

$self->add(CAM::XML instance)
$self->add(-text => $text)
$self->add(-cdata => $text)
$self->add(-xml => $rawxml)
$self->add(<multiple elements of the above types>)

Add content within the current tag. Order of addition may be significant. This content can be any one of 1) subsidiary XML tags (CAM::XML), 2) literal text content (-text or -cdata), or 3) pre-formatted XML content (-xml).

In -text and -cdata content, any reserved characters will be automatically escaped. Those two modes differ only in their XML representation: -cdata is more human-readable if there are a lot of "&", "<" and ">" characters in your text, where -text is usually more compact for short strings. These strings are not escaped until output.

Content in -xml mode is parsed in as CAM::XML objects. If it is not valid XML, a warning will be emitted and the add will fail.

$self->removeWhitespace()

Clean out all non-significant whitespace. Whitespace is deemed non-significant if it is bracketed by tags. This might not be true in some data formats (e.g. HTML) so don't use this function carelessly.

$self->getInnerText()

For the given node, descend through all of its children and concatenate all the text values that are found. If none, this method returns an empty string (not undef).

$self->getNodes(-tag => $tagname)
$self->getNodes(-attr => $attrname, -value => $attrvalue)
$self->getNodes(-path => $path)

Return an array of CAM::XML objects representing nodes that match the requested properties.

A path is a syntactic path into the XML doc something like an XPath

  '/' divides nodes
  '//' means any number of nodes
  '/[n]' means the nth child of a node (1-based)
  '<tag>[n]' means the nth instance of this node
  '/[-n]' means the nth child of a node, counting backward
  '/[last()]' means the last child of a node (same as [-1])
  '/[@attr="value"]' means a node with this attribute value
  '/text()' means all of the text data inside a node
            (note this returns just one node, not all the nodes)

For example, /html/body//table/tr[1]/td/a[@target="_blank"] searches an XHTML body for all tables, and returns all anchor nodes in the first row which pop new windows.

Please note that while this syntax resembles XPath, it is FAR from a complete (or even correct) implementation. It's useful for basic delving into an XML document, however.

$self->toString([OPTIONS])

Serializes the tag and all subsidiary tags into an XML string. This is called recursively on any subsidiary CAM::XML objects. Note that the XML header is not prepended to this output.

The following optional arguments apply:

  -formatted => boolean
        If true, the XML is indented nicely.  Otherwise, no whitespace
        is inserted between tags.
  -textformat => boolean
        Only relevent if -formatted is true.  If false, this prevents
        the formatting of pure text values.
  -level => number
        Indents this tag by the number of levels indicated.  This implies
        -formatted => 1
  -indent => number
        The number of spaces to indent per level if the output is
        formatted.  By default, this is 2 (i.e. two spaces).

Example: -formatted => 0

   <foo><bar>Baz</bar></foo>

Example: -formatted => 1

   <foo>
     <bar>
       Baz
     </bar>
   </foo>

Example: -formatted => 1, textformat => 0

   <foo>
     <bar>Baz</bar>
   </foo>

Example: -formatted => 1, textformat => 0, -indent => 4

   <foo>
       <bar>Baz</bar>
   </foo>

ENCODING

It is assumed that all text will be UTF-8. This includes any tag names, attribute keys and values, text content, and raw XML content that are added to the data structure.

CODING

This module has just over 97% code coverage in its regression tests, as reported by Devel::Cover via perl Build testcover. The remaining few percent is mostly error conditions and a few conditional defaults on internal methods.

This module passes most of the Perl Best Practices guidelines, as enforced by Perl::Critic v0.14. A notable exceptions is the legacy use of camelCase subroutine names.

AUTHOR

Clotho Advanced Media Inc., cpan@clotho.com

Primary Developer: Chris Dolan