entites seem to cause magicparser to stop parsing a feed

Submitted by damer on Tue, 2007-07-03 18:30 in Magic Parser

Hi ... I am not sure if this problem has been covered or fixed in newer version of the software but I am noticing that if a feed has illegal XML characters such as html entities or something like '\t' the parser will stop parsing the feed rather than just skipping the record...is there anyway to instruct the parser to just skip the record?

Thanks.

Hi, Unfortunately this is a

Submitted by support on Tue, 2007-07-03 18:38

Hi,

Unfortunately this is a limitation of the core PHP parser functions upon which Magic Parser is built. People have come across this before, and the solution is generally to process the XML beforehand to remove the illegal entities. Technically, the file is not valid XML by containing these entities, so in the first instance it is always worth contacting the feed provider to see if they are aware of the problem.

I would like to look into this - could you perhaps email me (zipped if possible) the feed (or a link to the feed), and also what record # the problem occurs and i'll look into it and see if there is anything that can be done easily - I appreciate that reading an entire feed into memory to remove entities is not really practical...

Cheers,
David.