test.ical.ly | getting the web by the balls

Mar/11

8

Simply iterate over XML with plain PHP using little memory and CPU

Working with Symfony2 gives you the freedom to use plain PHP again instead of inheriting stuff just to get things integrated.

One of the things I have been working on lately was a simple XML parser. It’s a simple XML structure in my case though it could be more complex without much change. My solution was a quite powerful yet simple combination of XMLReader and the Iterator interface.

I started with this XML that I needed to import into the database. I had no control over its structure. It looks similar to this.

There can be hundreds of <item> elements but the structure stays stable and doesn’t get deeper. Each element will always contain those three elements <v0> to <v2>.

To import them to the database it seemed plausable to try and iterate over them so – using Doctrine 2.0 – I could persist them in each loop and flush them down in one go.

As the structure is very straight forward and there is no need to traverse XMLReader was the obvious choice as it works on a stream and keeps nothing but the current element in memory.

This is what it looks like.

You see that XMLReader can easily be extended and implement the Iterator interface. The quite ugly nested readItem() method might be improvable but for this kind of structure it suffices. Within it you will get a mapped array that is far more meaningful that the v0, v1 and v2 fields.

The usage is also very simple.

As you see the XML can now simply be iterated over. This will probably also work with simple RSS feed XML and with a little more code you will also be able to adjust it to deeper nested structures.

I’ve tested this on XML files with more than 40 value elements for each of the about 10.000 item elements and it does run a while. But CPU usage doesn’t go up and memory usage stays low as well.

Maybe this can be of help to some of you?

· · · ·



  • http://blog.liip.ch Chregu

    Not sure if it helps your code, but maybe:
    With http://ch.php.net/manual/en/xmlreader.next.php you could jump to the next element on the same level with a certain name. So in your case you could use

    while ($this->reader->next($this->key))

    to jump to the next $this->key element once your on that level.

    As said, not sure if it helps much in your case, but could bring some performance improvements and often gets forgotten.

  • florian klein

    hi!

    what sf2 permits too is to externalize class dependencies in the DI, so you can inject the XmlReader instance from a container.

    even without the di, you can externalize instanciation of XmlReader and inject it in the CustomXml constructor

  • http://test.ical.ly Christian

    @Chregu that’s true, thanks for mentioning!

    @florian You are completely right of course but for the sake of a simple demonstration the DIC would only distract imho.

  • Richard

    You might take a look at the new Doctrine OXM project announced at http://www.doctrine-project.org/blog/doctrine-oxm-intro. It’s aim is to provide something very similar.

  • Pingback: abcphp.com

  • http://test.ical.ly Christian

    @Richard yeah I read about this but I didn’t assume that it is usable yet right?

  • Richard

    @Christian not yet, but it’s getting there!

  • http://test.ical.ly Christian

    @Richard I am definitely following your progress!

  • Pingback: Christian Schaefer’s Blog: Simply iterate over XML with plain PHP using little memory and CPU | Development Blog With Code Updates : Developercast.com

  • Pingback: A semana no mundo PHP (11/03/2011) | raphael.dealmeida

  • http://www.naenius.com Mike van Riel

    Do you know how this compares versus using XPath in a DOMDocument?

    For content selection and processing I currently use XPath queries to get the most optimal selection of data performance and memory-wise.
    But I am very curious if this will be faster / less memory intensive.

  • http://test.ical.ly Christian

    @Mike well I heven’t done any benchmarks and don’t plan to do so. But even for XPath the whole document is required as a DOM instance in memory while XMLReader works on a stream which only holds the immediate nodes in memory and “forgets” them when movin forward. So memory wise I suspect XMLReader to be much much better.
    But in terms of flexibility when you need to traverse the XML not only forward but backwards and sidewards as well then XPath seems the better option to me.
    XMLReader/XMLWriter will probably always outperform DOM based XML technology but you will have to write a lot (!) more code the more complex your XML is.

  • peter

    Great piece of Code! I only had to add $this->key = $key in the constructor to make it work

  • caefer

    Thanks! I fixed the gist now. :)

<<

>>

Theme Design by devolux.nh2.me