Support Forum

Request new password

Active Forum Topics

memory exhausted to continuously parse multiple xml files

Submitted by lang2000 on Tue, 2008-04-29 17:46 in Magic Parser

Hi David:

I am trying to parse 30 XML files continuously in the same folder in one go on the server, the size of these XML files are between 10KB and 4MB, it is ok to parse majority of the files in the folder, however, it always left a few files, and giving the warning message as follows:

Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 6260901 bytes) in
/home/xxxx/public_html/feed/ldfparser.php on line 71

any idea how to solve the problem please?

Thanks a lot.

Regards
Lin

Hi Lin, The error is

Submitted by support on Tue, 2008-04-29 21:31

Hi Lin,

The error is indicating that it is caused on line 71 of your script - can you post that line so that I can see what might be causing it... (most text editors indicate the line number in the status bar). If you're not sure - feel free to email me the exact script that is causing this error and i'll check it out for you (reply to your reg code or forum registration email is the easiest way to get me)...

Cheers,
David.

Hi David: You can see the

Submitted by lang2000 on Wed, 2008-04-30 11:13

Hi David:

You can see the error on :

http://www.jowjow.co.uk/feed/events_ldf.php

I Think i know why this error occurred, when I looked at the XML file to be parsed:

http://www.jowjow.co.uk/feed/uploads/Dennis_Publishing_Feed_02042008/Dennis_MUSEUM_EVENT_20080402.xml

It actually indicates there is an error in this xml file, even though i am not clear what the error is (any chance you can figure out what's wrong with this xml file?)

This xml file should have the same structure as this xml file which is fine to be parsed by MargicParser:

http://www.jowjow.co.uk/feed/uploads/Dennis_Publishing_Feed_02042008/Dennis_ART_EVENT_20080402.xml

You can see the results to parse this correctly structured XML file at:

http://www.jowjow.co.uk/feed/events_ldf_art.php

Because I am trying to parse a few XML files in one go, Is there any way to avoid the badly formatted the xml file and indicate an error message saying there is an error to parse the file, and skip to the next file to parse?

The php file that I use to parse the XML file is:

<?php
  require_once('Connections/ldfxml.php');
  require("MagicParser.php");
  mysql_select_db("linj1601_ldfxml") or die(mysql_error());
  // global array to hold the titles
  $titles = array();
  function changeDate($title_date_origin) {
$title_date_array = explode("/", $title_date_origin);
krsort($title_date_array);
return implode("-",$title_date_array);
}
  // record handler to build the above array from the XML
  function myTitleRecordHandler($title)
  {
    global $titles;
    $titles[] = $title;
    //print_r($title);
  }
  // globl array to hold venues and a mapping array to associated
  // titles with venues
  $venues = array();
  $title2venue = array();
  // record handler to build the above arrays from the XML
  function myVenueRecordHandler($venue)
  {
    // create array of venues
    global $venues;
    $venues[$venue["VENUE-VENUE_ID"]] = $venue["VENUE-VENUE_NAME"];
    // now create a mapping array to match venues with titles
    // to do this, we go through the entire record looking for
    // TITLE_ID fields (they will be differentiated with @1, @2..
    // but this can be ignored for now
    global $title2venue;
    foreach($venue as $k => $v)
    {
      if (strpos($k,"TITLE_ID"))
      {
        $title2venue[$venue[$k]] = $venue["VENUE-VENUE_ID"];
      }
    }
  }
  // load the XML into a variable so that we don't hit the remote server twice!
 $xml = "";
  $filename = "uploads/Dennis_Publishing_Feed_02042008/Dennis_ART_EVENT_20080402.xml";
  $fp = fopen($filename,"r");
  if ($fp)
  {
    while(!feof($fp)) $xml .= fread($fp,1024);
    fclose($fp);
  }
  else
  {
    print "Error opening ".$filename;
    exit();
  }
  print "Bytes Received: ".strlen($xml);
  // first parse to load all titles into the global array $titles
  MagicParser_parse("string://".$xml,"myTitleRecordHandler","xml|LISTINGS/POI/VENUE/TITLES/TITLE/");
  // second parse to generate title > venue mapping array
  MagicParser_parse("string://".$xml,"myVenueRecordHandler","xml|LISTINGS/POI/VENUE/");
  // finally we can handle the $titles array using foreach() exactly as the array would have been
  // handled within myRecordHandler, using $title to access the XML elements.
  // the code below shows how to extract the multiple event by using a counter
  // and looking for the way Magic Parser has resolved the duplicate names using @1, @2, etc..
  foreach($titles as $title)
  {
    $title_start_date = changeDate($title["PERFORMANCE/START_DATE"]);
    $title_end_date = changeDate($title["PERFORMANCE/END_DATE"]);
 /* $title_start_date_origin = $title["PERFORMANCE/START_DATE"];
  $title_start_date_array = explode("/", $title_start_date_origin);
  krsort($title_start_date_array);
  $title_start_date = implode("-",$title_start_date_array);*/
   $sql =
    "REPLACE INTO event
    (
    event_id,
    event_title,
    event_venue_id,
    event_description,
    event_start_date,
    event_end_date
    )
    VALUES
    (
    '".mysql_real_escape_string($title["TITLE-TITLE_ID"])."',
    '".mysql_real_escape_string($title["TITLE-TITLE_NAME"])."',
    '".mysql_real_escape_string($title2venue[$title["TITLE-TITLE_ID"]])."',
    '".mysql_real_escape_string($title["PERFORMANCE/PERFORMANCE_DESCRIPTION"])."',
    '".mysql_real_escape_string($title_start_date)."',
    '".mysql_real_escape_string($title_end_date)."'
    )
    ";
    if (!mysql_query($sql))
     {
       // SQL failed, print error message and abort
       print mysql_error();exit();
     }
    print "<br/>".$sql;
    print "<h2>".$title["TITLE-TITLE_NAME"]."<br/>".$title["TITLE-TITLE_ID"]."</h2>";
    print "<h3>Venue:".$venues[$title2venue[$title["TITLE-TITLE_ID"]]]."<br/>Venue ID: ".$title2venue[$title["TITLE-TITLE_ID"]]."</h3>";
    print "<blockquote>";
    print "<h4>Performances: ".$title["PERFORMANCE/PERFORMANCE_DESCRIPTION"]."</h4>";
        print "<h4>Start Date: ".$title["PERFORMANCE/START_DATE"]."</45>";
        print "<h4>End Date: ".$title["PERFORMANCE/END_DATE"]."</h4>";
    print "<ul>";
    $postfix = "";
    $i = 0;
    while(1) {
      if ($i) $postfix = "@".$i;
      if (!$title["EVENTS/EVENT".$postfix."-EVENT_ID"]) break;
      $event_id = $title["EVENTS/EVENT".$postfix."-EVENT_ID"];
      $event_start_date = $title["EVENTS/EVENT".$postfix."-EVENT_START_DATE"];
      $event_end_date = $title["EVENTS/EVENT".$postfix."-EVENT_END_DATE"];
      $event_start_time = $title["EVENTS/EVENT".$postfix."-EVENT_START_TIME"];
      print "<li>".$event_start_date." at ".$event_start_time."</li>";
      $i++;
    }
    print "</ul>";
    print "</blockquote>";
  }
?>

Thanks
Lin

Hello Lin, The tricky part

Submitted by support on Wed, 2008-04-30 12:20

Hello Lin,

The tricky part about this is, is that it is not really possible to tell that XML is badly formatted until it has been parsed to the point at which the error occurs; by which time the memory excess will have already been reached. What will be happening is that the corrupted XML is causing the parser to build up a very long string (effectively an extremely long value in one of the fields) as a terminating tag will may be missing; so i'll look at the script and consider options for putting a "stop" in for you that would abandon the parse if the size of a single field exceeds a certain amount.

However, looking at the error message; the memory allocation actually fails on the following line (71) of the main script, not within MagicParser.php, so it may be that the entire XML has been read without causing a memory error; and then when it comes to trying to sort it this code takes the script over the memory limit:

krsort($title_start_date_array);

...but I notice that this is now commented out. Did this remove the error after removing this section?

One option I think would be to check $title for validity before attempting to process / sort...

Cheers,
David.

Hi David: I have removed all

Submitted by lang2000 on Wed, 2008-04-30 14:29

Hi David:

I have removed all the scripts I wrote (sorting the array, inserting to the database, etc), and left the code you previously suggested in :

http://www.magicparser.com/node/745
(I should have posted this thread into http://www.magicparser.com/node/745 as i think they are relevant.)

The XML to parse:

http://www.jowjow.co.uk/feed/uploads/Dennis_Publishing_Feed_02042008/Dennis_MUSEUM_EVENT_20080402.xml
It can be downloaded at http://www.jowjow.co.uk/feed/Dennis_MUSEUM_EVENT_20080402.xml.zip

The result of parsing:

http://www.jowjow.co.uk/feed/events_ldf_jowjow.php

It still shows the memory problem:

The code for event_ldf_jowjow.php is:

<?php
  require("MagicParser.php");
  // global array to hold the titles
  $titles = array();
  // record handler to build the above array from the XML
  function myTitleRecordHandler($title)
  {
    global $titles;
    $titles[] = $title;
  }
  // globl array to hold venues and a mapping array to associated
  // titles with venues
  $venues = array();
  $title2venue = array();
  // record handler to build the above arrays from the XML
  function myVenueRecordHandler($venue)
  {
    // create array of venues
    global $venues;
    $venues[$venue["VENUE-VENUE_ID"]] = $venue["VENUE-VENUE_NAME"];
    // now create a mapping array to match venues with titles
    // to do this, we go through the entire record looking for
    // TITLE_ID fields (they will be differentiated with @1, @2..
    // but this can be ignored for now
    global $title2venue;
    foreach($venue as $k => $v)
    {
      if (strpos($k,"TITLE_ID"))
      {
        $title2venue[$venue[$k]] = $venue["VENUE-VENUE_ID"];
      }
    }
  }
  // load the XML into a variable so that we don't hit the remote server twice!
  $xml = "";
  $url = "http://www.jowjow.co.uk/feed/uploads/Dennis_Publishing_Feed_02042008/Dennis_MUSEUM_EVENT_20080402.xml";
  $fp = fopen($url,$r);
  while(!feof($fp)) $xml .= fread($fp,1024);
  fclose($fp);
  print "Bytes Received: ".strlen($xml);
  // first parse to load all titles into the global array $titles
  MagicParser_parse("string://".$xml,"myTitleRecordHandler","xml|LISTINGS/POI/VENUE/TITLES/TITLE/");
  print_r($venues);
  // second parse to generate title > venue mapping array
  MagicParser_parse("string://".$xml,"myVenueRecordHandler","xml|LISTINGS/POI/VENUE/");
  // finally we can handle the $titles array using foreach() exactly as the array would have been
  // handled within myRecordHandler, using $title to access the XML elements.
  // the code below shows how to extract the multiple event by using a counter
  // and looking for the way Magic Parser has resolved the duplicate names using @1, @2, etc..
  foreach($titles as $title)
  {
    print "<h2>".$title["TITLE-TITLE_NAME"]."</h2>";
    print "<h3>Venue:".$venues[$title2venue[$title["TITLE-TITLE_ID"]]]."</h3>";
    print "<blockquote>";
    print "<h4>Performances</h4>";
    print "<ul>";
    $postfix = "";
    $i = 0;
    while(1) {
      if ($i) $postfix = "@".$i;
      if (!$title["EVENTS/EVENT".$postfix."-EVENT_ID"]) break;
      $event_id = $title["EVENTS/EVENT".$postfix."-EVENT_ID"];
      $event_start_date = $title["EVENTS/EVENT".$postfix."-EVENT_START_DATE"];
      $event_end_date = $title["EVENTS/EVENT".$postfix."-EVENT_END_DATE"];
      $event_start_time = $title["EVENTS/EVENT".$postfix."-EVENT_START_TIME"];
      print "<li>".$event_start_date." at ".$event_start_time."</li>";
      $i++;
    }
    print "</ul>";
    print "</blockquote>";
  }
?>

Thanks

Lin

Hello Lin, The XML actually

Submitted by support on Wed, 2008-04-30 15:46

Hello Lin,

The XML actually looks fine - I think the problem is the sheer number of EVENT records within that particular feed; and as you are not (currently) parsing at the EVENT level they are being added to the result array, which in turn exceeds the maximum allowed memory limit on your server.

The first thing to try is to see if you are allowed to increase the memory limit to your scripts using the following code at the very top:

ini_set("memory_limit","128M");

If that doesn't make any difference, it is always worth a quick word with your hosting company to see if they are happy to increase the memory limit on your server; although ultimately I think a different approach would be required.

I'll study the XML and see what alternative strategy would work on this size feed with your 32M memory limitation...

Cheers,
David.

Hi David: What if I give up

Submitted by lang2000 on Wed, 2008-04-30 16:01

Hi David:

What if I give up parsing the EVENT level? Just want the VENUE and TITLES levels? would that help?

I have tried to commented out these code:

<?php
/*  $postfix = "";
    $i = 0;
    while(1) {
      if ($i) $postfix = "@".$i;
      if (!$title["EVENTS/EVENT".$postfix."-EVENT_ID"]) break;
      $event_id = $title["EVENTS/EVENT".$postfix."-EVENT_ID"];
      $event_start_date = $title["EVENTS/EVENT".$postfix."-EVENT_START_DATE"];
      $event_end_date = $title["EVENTS/EVENT".$postfix."-EVENT_END_DATE"];
      $event_start_time = $title["EVENTS/EVENT".$postfix."-EVENT_START_TIME"];
      print "<li>".$event_start_date." at ".$event_start_time."</li>";
      $i++;
    }*/
?>

And I have insert the following code on top of the php file:

<?php
ini_set("memory_limit","128M");
?>

Still no luck as you can see:

http://www.jowjow.co.uk/feed/events_ldf_jowjow.php

Thanks

Lin

Hello Lin, Unfortunately,

Submitted by support on Wed, 2008-04-30 16:44

Hello Lin,

Unfortunately, when you try to parse at a higher level everything below that level is included in the parse; hense memory is being exhausted when loading every event into the record. I'll investigate if it would be easily feasible to modify Magic Parser to return only elements at the current level (therefore ignoring duplicate child records).

In the mean time, if as a result of this you decide that Magic Parser is not suitable for your application please let me know and I will of course refund your purchase...

Cheers,
David.