Advertisement
Advertisement


What XML parser do you use for PHP?


Question

I like the XMLReader class for it's simplicity and speed. But I like the xml_parse associated functions as it better allows for error recovery. It would be nice if the XMLReader class would throw exceptions for things like invalid entity refs instead of just issuinng a warning.

2008/09/16
1
3
9/16/2008 1:32:02 AM

Accepted Answer

I'd avoid SimpleXML if you can. Though it looks very tempting by getting to avoid a lot of "ugly" code, it's just what the name suggests: simple. For example, it can't handle this:

<p>
    Here is <strong>a very simple</strong> XML document.
</p>

Bite the bullet and go to the DOM Functions. The power of it far outweighs the little bit extra complexity. If you're familiar at all with DOM manipulation in Javascript, you'll feel right at home with this library.

2008/09/16
4
9/16/2008 1:41:12 AM


SimpleXML and DOM work seamlessly together, so you can use the same XML interacting with it as SimpleXML or DOM.

For example:

$simplexml = simplexml_load_string("<xml></xml>");
$simplexml->simple = "it is simple.";

$domxml = dom_import_simplexml($simplexml);
$node = $domxml->ownerDocument->createElement("dom", "yes, with DOM too.");
$domxml->ownerDocument->firstChild->appendChild($node);

echo (string)$simplexml->dom;

You will get the result:

"yes, with DOM too."

Because when you import the object (either into simplexml or dom) it uses the same underlining PHP object by reference.

I figured this out when I was trying to correct some of the errors in SimpleXML by extending/wrapping the object.

See http://code.google.com/p/blibrary/source/browse/trunk/classes/bXml.class.inc for examples.

This is really good for small chunks of XML (-2MB), as DOM/SimpleXML pull the full document into memory with some additional overhead (think x2 or x3). For large XML chunks (+2MB) you'll want to use XMLReader/XMLWriter to parse SAX style, with low memory overhead. I've used 14MB+ documents successfully with XMLReader/XMLWriter.

2008/09/16

There are at least four options when using PHP5 to parse XML files. The best option depends on the complexity and size of the XML file.

There’s a very good 3-part article series titled ‘XML for PHP developers’ at IBM developerWorks.

“Parsing with the DOM, now fully compliant with the W3C standard, is a familiar option, and is your choice for complex but relatively small documents. SimpleXML is the way to go for basic and not-too-large XML documents, and XMLReader, easier and faster than SAX, is the stream parser of choice for large documents.”

2008/09/16

I mostly stick to SimpleXML, at least whenever PHP5 is available for me.

http://www.php.net/simplexml

2008/09/16