The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

XML::Simple::DTDReader - Simple XML file reading based on their DTDs

SYNOPSIS

  use XML::Simple::DTDReader;

  my $ref = XMLin("data.xml");

Or the object oriented way:

  require XML::Simple::DTDReader;

  my $xsd = XML::Simple::DTDReader->new;
  my $ref = $xsd->XMLin("data.xml");

DESCRIPTION

XML::Simple::DTDReader aims to be a XML::Simple drop-in replacement, but with several aspects of the module controlled by the XML's DTD. Specifically, array folding and array forcing are inferred from the DTD.

Currently, only XMLin is supported; support for XMLout is planned for later releases.

XMLin()

Parses XML formatted data and returns a reference to a data structure which contains the same information in a more readily accessible form. (Skip down to "EXAMPLES" for sample code). The XML must have a valid <!DOCTYPE> element.

XMLin() accepts an optional XML specifier, which can be one of the following:

A filename

If the filename contains no directory components XMLin() will look for the file in the current directory. Note, the filename '-' can be used to parse from STDIN. eg:

  $ref = XMLin('/etc/params.xml');
undef

If there is no XML specifier, XMLin() will check the script directory for a file with the same name as the script but with the extension '.xml'. eg:

  $ref = XMLin();
A string of XML

A string containing XML (recognized by the presence of '<' and '>' characters) will be parsed directly. eg:

  $ref = XMLin('<opt username="bob" password="flurp" />');
An IO::Handle object

An IO::HAndle object will be read to EOF and its contents parsed. eg:

  $fh = new IO::File('/etc/params.xml');
  $ref = XMLin($fh);

OPTIONS

Currently, none of XML::Simple's myriad of options are supported. Support for ContentKey, ForceContent, KeepRoot, SearchPath, and ValueAttr are planned for future releases.

DTD CONFIGURATION

XML::Simple::DTDReader is able to deal with inline and external DTDs. Inline DTDs take the form:

  <?xml version="1.0" encoding="UTF-8" ?>
  <!DOCTYPE greeting [
    <!ELEMENT greeting (#PCDATA)>
  ]>
  <greeting>Hello, world!</greeting>

External DTDs are either system DTDs or public DTDs. System DTDs are of the form:

  <?xml version="1.0"?>
  <!DOCTYPE greeting SYSTEM "hello.dtd">
  <greeting>Hello, world!</greeting> 

The path in the external system identifier hello.dtd is relative to the path to the XML file in question, or to the current working directory if the XML does not come from a file, or the path to the file cannot be determined.

Public DTDs take the form:

  <?xml version="1.0"?>
  <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN"
            "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">
  <svg>
    <path d="M202,702l1,-3l7,-3l3,1l3,7l-1,3l-7,4l-3,-1l-3,-8z" />
  </svg>

Two properties of the DTD are used by XML::Simple::DTDReader when determining the final structure of the data; repeated elements, and ID attributes. In the DTD, specifications of the form element+ or element* will lead to the key element mapping to an anonymous array. This is perhaps best illustrated with an example:

  <?xml version="1.0" encoding="iso-8859-1"?>
  <!DOCTYPE data [
    <!ELEMENT data (stuff+)>
    <!ELEMENT stuff (name,other*)>
    <!ELEMENT name  (#PCDATA)>
    <!ELEMENT other (#PCDATA)>
  ]>
  <data>
    <stuff>
      <name>Moose</name>
      <other>Value</other>
    </stuff>
    <stuff>
      <name>Thingy</name>
      <other>Value</other>
      <other>Value2</other>
    </stuff>
  </data>

...will map to the data structure:

  {
    stuff => [
              {
               name => "Moose",
               other => ["Value"],
              },
              {
               name => "Thingy",
               other => ["Value", "Value2"],
              }
             ]
  }

The other element of the DTD that impacts the data structure is ID attributes. In XML, ID attributes are unique across a file, which is a more general case of Perl's restriction that keys be unique in a hash. Hence, the presence of attributes of type ID will cause that layer of the data to be folded into a hash, based on the value of the ID attribute as the key. This is again, best illustrated by example:

  <?xml version="1.0" encoding="iso-8859-1"?>
  <!DOCTYPE data [
    <!ELEMENT data (stuff+)>
    <!ELEMENT stuff (name)>
    <!ATTLIST stuff attrib ID #REQUIRED>
    <!ELEMENT name  (#PCDATA)>
  ]>
  <data>
    <stuff attrib="first">
      <name>Moose</name>
    </stuff>
    <stuff attrib="second">
      <name>Thingy</name>
    </stuff>
  </data>

...will lead to the data structure:

  {
    stuff => {
              first => {
                        name => "Moose",
                        attrib => "first"
                       },
              second => {
                         name => "Thingy",
                         attrib => "second"
                        }
             }
  }

XML::Simple::DTDReader recognizes most ELEMENT types, with the exception of mixed data (#PCDATA intermixed with elements) or ANY data. Attempts to parse DTDs describing elements with these types will result in an error.

ERROR HANDLING

XML::Simple::DTDReader is more strict than XML::Simple in parsing of documents; not only must the documents be compliant, they must also follow the DTD specified. XML::Simple::DTDReader will die with an appropriate message if it encounters a parsing of validation error.

EXAMPLES

See the t/ directory of the distribution for a number of example XML files, and the perl data structures they map to.

BUGS

None currently known, but I'm sure there are several.

AUTHOR

Contact Info

Alex Vandiver : alexmv@mit.edu

Copyright (C) 2003 Alex Vandiver. All rights reserved. This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself.