You are here:  » Script bails out without finishing parsing


Script bails out without finishing parsing

Submitted by Huggie on Wed, 2009-02-25 10:47 in

Hi David,

This isn't a MagicParser specific problem, but I hoped you might be able to help as you may have seen this before.

I've just got a new XML feed that I'm parsing with MagicParser and it gets a little way through the feed and stops. PHP doesn't report any errors.

The feed is slightly larger than usual at approx 5.7MB, but this shouldn't be too much of a problem. The memory_limit of PHP is set to 128MB and the max_execution_time is set to 180.

I've tried this on a local WAMP server and it parses the entire file fine, so this is obviously some restriction on the remote server rather than a problem with the parser. Incidentally, my local server has the same memory_limit set but a lower max_execution_time set, so I'm certain that's not it. I'm parsing the file locally, rather than from a URL.

Do you have any pointers?

Regards
Rich

Submitted by support on Wed, 2009-02-25 11:04

Hello Rich,

Some versions of the lower level PHP XML library that Magic Parser uses are more tolerant to character encoding issues in the data than others. I'll email you a test version of the script that cleanses data as it is read...

Cheers,
David.

Submitted by Huggie on Wed, 2009-02-25 11:47

David,

Excellent, that did the job. Now, can you provide me with a version of the ASCII script that clenses the data, and also combines the other script you provided me that has the FASTCSV option.

Regards
Rich

Submitted by support on Wed, 2009-02-25 11:56

On it's way...!

Cheers,
David.

Submitted by Huggie on Wed, 2009-02-25 12:33

Thanks,

Am I right in thinking that the new script now checks every single character in the file to make sure that it's an ASCII character in the range of 32 - 127 and only includes it if it is, hence the longer time that it now takes to parse the file?

Regards
Rich

Submitted by support on Wed, 2009-02-25 12:35

Hi Rich,

That's right - so yes it will slow the parse down slightly. The fact that this does "fix" the problem, does mean that the XML that requires this fix is not actually valid - because it contains a character encoding error; so it may be worth making the provider of the feed aware of that as they may not be aware of the problem...

Cheers,
David.

Submitted by Huggie on Wed, 2009-02-25 12:46

Thanks David,

I'll get onto them and let them know.

Rich

Submitted by Huggie on Wed, 2009-02-25 13:57

David,

You said this in your previous post:

"The fact that this does "fix" the problem, means that the XML that requires this fix is not actually valid - because it contains a character encoding error".

The encoding in the feed is specified as follows:
<?xml version="1.0" encoding="ISO-8859-1"?>

Although outside the ASCII character range of 32 - 127, the characters were in the extended set of 160 - 255 within the ISO-8859-1 character set, Does this mean that indeed the feed is valid? If so, how can I parse these characters?

Rich

Submitted by support on Wed, 2009-02-25 14:01

Hi Rich,

I'll send you another version to try that uses PHP's encoding functions to cleanse the data into valid iso-8859-1...

Cheers,
David.

Submitted by Huggie on Wed, 2009-02-25 14:10

OK, I'm just reading a bit more about xml_parse() and character encoding.

Thanks again David.

Rich

Submitted by Huggie on Wed, 2009-02-25 14:47

David,

utf8_decode() didn't work, it got further through the file, but still wasn't happy. I read the docs and it would seem that utf8_encode() would be better suited as we have a valid iso-8859-1 encoded xml file as the input.

I changed it to utf8_encode() and it parsed without any problems.

Regards
Rich

Submitted by RodThePlod on Fri, 2009-03-20 21:24

David,

I seem to be coming across a similar problem and I'm wondering if you can supply me with your cleansing code?

Background:
I'm developing an iPhone application which interacts with data held on my website.
I'm using Magic Parser to help me grab a couple of XML feeds which I then build into one XML feed and serve from the site. The iPhone app connects and reads this file.

However, the file sometimes contains extended characters which then makes my app fall over.

I am trying to make the code as robust as possible, but at the same time, I would like some way of ensuring that the data written into the XML file in the first place is properly 'cleansed' and utf-8 compliant.

Can you help?

Cheers,

Rod.

Submitted by support on Fri, 2009-03-20 21:39

Hi Rod,

Sure - i'll email you the utf8 cleansed version...

Cheers,
David.

Submitted by RodThePlod on Thu, 2009-03-26 22:28

Just thought I'd post an update here. Thanks for the very quick reply David - your utf8 cleansed version did the trick! I can now handle the data from the XML feeds without my app crashing or doing funny things ;o)

Thanks very much for your help - and for a fantastic product in Magic Parser.

Best regards,

Rod.