I have used MagicParser many times to process XML files. However, this new file is not processing past the first record. I have spent a few hours trying to get this resolved and it is probably something simple, but for now I just cannot see the problem.
Here is the PHP code I use to test new files with:
<?php
header("Content-Type: text/html;charset=utf-8");
require("MagicParser.php");
$url = 'C:\Web\Properties\import\test.xml ';
$result = MagicParser_parse($url,"myRecordHandler","xml|properties/property/");
if (!$result)
{
print "ERRORS!!!!!!!"."\n";
$err = MagicParser_getErrorMessage();
print $err;
}
function myRecordHandler($record)
{
print_r($record);
}
?>
And here are the first two records from the large XML file that I am trying to read:
{code saved}
Any help would be gratefully received.
Ted
Hi David,
Thanks for the reply, I will test it out. I do know that there are characters in the XML file that will probably upset MagicParser and the file is very big. Unfortunately I don't have access to the source to clean it up beforehand. My customer has the XML feed and wants me to move the hosting as the guy who manages it has gone AWOL, so there is no way of getting it corrected. If you have a version of Magic Parser that will help me here then I would really appreciate you sending it to me.
Regards,
Ted
Hi Ted,
Could you drop me an email so that I can puck up your regular address as I only have your Google Checkout address on file, then I'll send the cleansing version of Magic Parser to you...
Cheers,
David.
Hi David,
I dropped you an email last Friday but have heard nothing since. I know you are probably very busy but I just wanted to confirm that at least my email got through to you. I don't mind waiting if that is the case, but if my email didn't reach you then I can try again.
Regards,
Ted
Hello Ted,
I'm really sorry about that not sure why I didn't receive your email.
If you would like to post your regular address in a reply to this thread I'll remove it before publishing and update your address, then I'll forward the files again for you...
Apologies for any inconvenience,
All the best,
David.
Hi David,
My email address is {saved}
Thanks, and look forward to getting the clean-up version.
Regards,
Ted
Hi Ted,
This is almost certainly down to a character encoding mis-match.
As the XML declares itself as UTF-8, any character sequence in the file that is not valid UTF-8 will cause the parser to abort. Different PHP installations (depending on the versions of the underlying XML libraries) handle encoding anomalies in slightly different ways.
If you have control over the XML source, the first thing I would try is simply to remove the XML version and encoding declaration at line 1 and try that. If that's not an option, cleansing the XML should do the trick - the following will require that your XML is not too large to be read entirely into memory - if it is, let me know and I'll send you a version of Magic Parser that does the character encoding cleansing internally...
<?php
header("Content-Type: text/html;charset=utf-8");
require("MagicParser.php");
$url = 'C:\Web\Properties\import\test.xml ';
$xml = file_get_contents($url);
$xml = utf8_encode($xml);
$result = MagicParser_parse("string://".$xml,"myRecordHandler","xml|properties/property/");
if (!$result)
{
print "ERRORS!!!!!!!"."\n";
$err = MagicParser_getErrorMessage();
print $err;
}
function myRecordHandler($record)
{
print_r($record);
}
?>
Hope this helps!
Cheers,
David
--
MagicParser.com