You are here:  » Stops reading after first record


Stops reading after first record

Submitted by tsnik on Fri, 2011-01-21 14:10 in

I have used MagicParser many times to process XML files. However, this new file is not processing past the first record. I have spent a few hours trying to get this resolved and it is probably something simple, but for now I just cannot see the problem.

Here is the PHP code I use to test new files with:

<?php
header
("Content-Type: text/html;charset=utf-8");
require(
"MagicParser.php");
$url 'C:\Web\Properties\import\test.xml ';
$result MagicParser_parse($url,"myRecordHandler","xml|properties/property/");
if (!
$result)
{
    print 
"ERRORS!!!!!!!"."\n";
    
$err MagicParser_getErrorMessage();
    print 
$err;
}
function 
myRecordHandler($record)
{
    
print_r($record);
}
?>

And here are the first two records from the large XML file that I am trying to read:

{code saved}

Any help would be gratefully received.

Ted

Submitted by support on Fri, 2011-01-21 14:17

Hi Ted,

This is almost certainly down to a character encoding mis-match.

As the XML declares itself as UTF-8, any character sequence in the file that is not valid UTF-8 will cause the parser to abort. Different PHP installations (depending on the versions of the underlying XML libraries) handle encoding anomalies in slightly different ways.

If you have control over the XML source, the first thing I would try is simply to remove the XML version and encoding declaration at line 1 and try that. If that's not an option, cleansing the XML should do the trick - the following will require that your XML is not too large to be read entirely into memory - if it is, let me know and I'll send you a version of Magic Parser that does the character encoding cleansing internally...

<?php
header
("Content-Type: text/html;charset=utf-8");
require(
"MagicParser.php");
$url 'C:\Web\Properties\import\test.xml ';
$xml file_get_contents($url);
$xml utf8_encode($xml);
$result MagicParser_parse("string://".$xml,"myRecordHandler","xml|properties/property/");
if (!
$result)
{
    print 
"ERRORS!!!!!!!"."\n";
    
$err MagicParser_getErrorMessage();
    print 
$err;
}
function 
myRecordHandler($record)
{
    
print_r($record);
}
?>

Hope this helps!
Cheers,
David
--
MagicParser.com

Submitted by tsnik on Fri, 2011-01-21 14:24

Hi David,

Thanks for the reply, I will test it out. I do know that there are characters in the XML file that will probably upset MagicParser and the file is very big. Unfortunately I don't have access to the source to clean it up beforehand. My customer has the XML feed and wants me to move the hosting as the guy who manages it has gone AWOL, so there is no way of getting it corrected. If you have a version of Magic Parser that will help me here then I would really appreciate you sending it to me.

Regards,

Ted

Submitted by support on Fri, 2011-01-21 14:30

Hi Ted,

Could you drop me an email so that I can puck up your regular address as I only have your Google Checkout address on file, then I'll send the cleansing version of Magic Parser to you...

Cheers,
David.

Submitted by tsnik on Mon, 2011-01-24 16:05

Hi David,

I dropped you an email last Friday but have heard nothing since. I know you are probably very busy but I just wanted to confirm that at least my email got through to you. I don't mind waiting if that is the case, but if my email didn't reach you then I can try again.

Regards,

Ted

Submitted by support on Mon, 2011-01-24 16:30

Hello Ted,

I'm really sorry about that not sure why I didn't receive your email.

If you would like to post your regular address in a reply to this thread I'll remove it before publishing and update your address, then I'll forward the files again for you...

Apologies for any inconvenience,
All the best,
David.

Submitted by tsnik on Mon, 2011-01-24 22:09

Hi David,

My email address is {saved}

Thanks, and look forward to getting the clean-up version.

Regards,

Ted

Submitted by support on Mon, 2011-01-24 22:26

Thanks, Ted - Email sent.