I want to import following data into my MySql database:
http://www.100ideeen.be/EzWeb/MSN/NewsML.asp
Afterwards I want to use the data to create html pages.
The parser however, generates tons of records for each html tag or line..
How can I avoid that?
Hello nans,
Because this particular XML has HTML content that is not escaped within CDATA tags, it is not very easy to parse in its current form.
Assuming that contacting the publisher to ask them to correct this is not an option; the only way I can think of to handle it is to load the feed into a string, and then add the CDATA tags manually around the BODY.CONTENT tags using str_replace.
This will at least mean that you can extract the body content from each record using:
$record["NEWSCOMPONENT/NEWSCOMPONENT/NEWSCOMPONENT/CONTENTITEM/DATACONTENT/NITF/BODY/BODY.CONTENT/"];
Here is an example; using print_r() to dump the first record, and content-type: text/plain; for clarity...
View Output
<?php
header("Content-Type: text/plain");
require("MagicParser.php");
function myRecordHandler($record)
{
print_r($record);
exit();
}
$url = "http://www.100ideeen.be/EzWeb/MSN/NewsML.asp";
$xml = "";
$fp = fopen($url,"r");
while(!feof($fp)) $xml .= fread($fp,1024);
fclose($fp);
$xml = str_replace("<body.content>","<body.content><![CDATA[",$xml);
$xml = str_replace("</body.content>","]]></body.content>",$xml);
MagicParser_parse("string://".$xml,"myRecordHandler","xml|NEWSML/NEWSITEM/");
?>
Hope this helsp!
Cheers,
David.