You are here:  » MagicParser runs for a very long time...


MagicParser runs for a very long time...

Submitted by tkambler on Mon, 2007-04-16 19:06 in

I have an XML file just over 300MB in size.

The script that I have setup calls MagicParser to read this file. For every entry in the file, it should add an entry to my MySQL database. The only problem is, I started the script roughly 2 hours ago, it is still running, and it has yet to add a single entry to my database.

I have tried running the same script on a very small XML file, and it works great.

Can anyone tell me what's going on here? Will MagicParser scan the entire 300MB XML file before adding the entries to my database?

Thanks,

Tim

Submitted by support on Mon, 2007-04-16 19:14

Hi Tim,

The parser should not have to read the entire file before calling your record handler function for each record. If your 300MB file is more or less the same as the smaller files that you have tried then your record handler function should be called just as quickly.

The test that I normally do in these situations is to make your record handler function return TRUE. This tells Magic Parser to stop reading any more records; and can be used in this situation as a test to check that records are being parsed correctly before letting your script run against the entire file. For example:

<?php
  
function myRecordHandler($record)
  {
    
// process record as normal
    // return TRUE to stop reading any more records
    
return TRUE;
  }
?>

Assuming that your database code is all working correctly, doing this should add one record to your database. If the result is no different this could indicate a problem with the formatting of the large XML file, and in this case the script would need to read the entire file before finishing; although even then I would not have thought it would take 2 hours to read 300MBs of non-XML data.

Can you view the start of your large file to confirm that it is valid XML? There aren't many text editors that will easily load a 300MB file, but you might be able to look at the first few lines using the "more" command on Linux or "cat" on Windows...

Hope this helps,
Cheers,
David.

Submitted by tkambler on Mon, 2007-04-16 19:27

I have noticed one problem with the 300MB XML file. It uses absolutely no line breaks. It is 300MB of data all on one line. Could this cause a problem?

Submitted by tkambler on Mon, 2007-04-16 19:34

I have tried your suggestion of having my function "return TRUE", and it continues to run indefinitely.

Submitted by support on Mon, 2007-04-16 19:40

Hi,

There's no problem with it all being on one line - the parser reads the file in chunks, not lines. If the single record test makes no difference this does indicate a formatting problem in the XML.

This could happen if the first record is not closed for some reason - in other words the parser thinks that the entire file is contributing to the first record rather than finding the closing tags.

You might be able to verify if this is the case by studying the file with one of the command line tools mentioned above. Alternatively, would it be possible for you to email me a URL from where I can download your 300MB file and have a look for you? I will then download the file to my dev server and run Magic Parser against it to see what is going on. If you're able to do this, replying to your reg code or forum registration email is the easiest way to get me...

Cheers,
David.