too many records

Hello nans,

Because this particular XML has HTML content that is not escaped within CDATA tags, it is not very easy to parse in its current form.

Assuming that contacting the publisher to ask them to correct this is not an option; the only way I can think of to handle it is to load the feed into a string, and then add the CDATA tags manually around the BODY.CONTENT tags using str_replace.

This will at least mean that you can extract the body content from each record using:

$record["NEWSCOMPONENT/NEWSCOMPONENT/NEWSCOMPONENT/CONTENTITEM/DATACONTENT/NITF/BODY/BODY.CONTENT/"];

Here is an example; using print_r() to dump the first record, and content-type: text/plain; for clarity...

View Output

<?php
  header("Content-Type: text/plain");
  require("MagicParser.php");
  function myRecordHandler($record)
  {
    print_r($record);
    exit();
  }
  $url = "http://www.100ideeen.be/EzWeb/MSN/NewsML.asp";
  $xml = "";
  $fp = fopen($url,"r");
  while(!feof($fp)) $xml .= fread($fp,1024);
  fclose($fp);  
  $xml = str_replace("<body.content>","<body.content><![CDATA[",$xml);
  $xml = str_replace("</body.content>","]]></body.content>",$xml);
  MagicParser_parse("string://".$xml,"myRecordHandler","xml|NEWSML/NEWSITEM/");
?>

Hope this helsp!
Cheers,
David.

Support Forum

Active Forum Topics