You are here:  » Parsing Stops on Records with Strange Characters

Parsing Stops on Records with Strange Characters

Submitted by virtualimpressions on Fri, 2013-11-08 03:45 in

I've read through this forum looking for similar issues, and I have tried all three downloads of the script, and I'm still getting stuck. The script stops parsing when it encounters an entry with odd characters. This issue seems to be with this line: é and any other line with à or Â

I'm working to parse data the listing into a database, which I've done successfully before, but the data provider changed their format and now the parsing stops with these odd characters

Here is the file I am working with:
{link saved}

I noticed even the demo section on this site would only display the first two records when the third record included a à in one of the fields.

How do I get around this? I have no control over the contents of the xml file they send me.

Submitted by support on Fri, 2013-11-08 10:11


I enabled PHP's XML error reporting for the source document and it was actually aborting because of an undeclared entity - that is when an &entity; that is not one of the defined standard XML entities was encountered in the stream, and further was not within a CDATA element.

I do have a work-around for this that I will email you shortly, but unfortunately that only increased the number of records parsed slightly until the parser encountered an unmatched element - a tag was opened but not closed, and in that situation there really is no work-around i'm afraid and it would be necessary to contact the provider of the file to let them know that it is invalid XML.

I'll forward the workaround version now anyway which may help you to make further progress...