You are here:  » Parsing a large file


Parsing a large file

Submitted by rcarter on Mon, 2009-04-27 12:11 in

Hello,

I'm trying to parse xml files that are around 200mb, for each xml $record I want to run an SQL query.

At the moment I'm getting a maximum time execution error (30 seconds), for the moment I can't move to dedicated hosting to be able to change php ini to a longer time.

I thought about splitting the xml file into smaller chunks, but my scripting seems to only be good enough to split a small file into smaller files, but I can't seem to make a large file into smaller ones (run of time)

I assume this must be fairly common? Any thoughts?

Could magicParser work on a per $record basis.

I sort of used this on a earlier xml parse - I don't want to keep using this, but this worked for me on the large files.

<?php
$file   
'house.xml';
$reader = new XMLReader();
$reader->open($file);
while (
$reader->read())
{
    
// are we in a house?
    
if ($reader->nodeType == XMLReader::ELEMENT &&
        
strtolower($reader->localName) == 'house')
    {
        
$node $reader->expand(); // expand the node into a DOMNode
        // Convert to SimpleXML via DOM, messy but SimpleXML is soo much nicer.
        
$dom  = new DomDocument();
        
$dom->appendChild$dom->importNode($nodeTRUE) );
        
$sxl simplexml_import_dom($dom);
        
// then do what we want to do.
        
processProduct($sxl);
        unset(
$node$dom$sxl);
    }
}
$reader->close();
unset(
$reader$file);
?>

Submitted by support on Mon, 2009-04-27 12:44

Hi,

If the above code can complete within the 30 seconds, Magic Parser should also, provided that you are giving a Format String value in the 3rd (optional) parameter to MagicParser_parse(), otherwise Magic Parser will be having to read the entire file twice - once to work out the format, and the second time to actually parse the records and hand each one to your myRecordHandler() function.

Are you currently using a Format String?

This is the string that describes which level of the XML you are interested in, for example:

xml|HOUSES/HOUSE/

...if your XML looked something like:

<houses>
  <house>
    ... house info ...
  </house>
  <house>
    ... house info ...
  </house>
  <house>
    ... house info ...
  </house>
</houses>

If you're not sure, would it be possible for you to email me a link to your XML and I'll work out the Format String for you...

Cheers,
David.