You are here:  » detecting "illegal" characters in XML


detecting "illegal" characters in XML

Submitted by Bud Hovell on Sun, 2010-07-04 18:52 in

Hi, David ...

Can you recommend a quick and dirty method we can evaluate an entire record for presence of special characters in both tags and contents (even when some contents are enclosed in CDATA tags) so we can simply report the record as invalid for that reason and then cleanly move along to evaluating and parsing the next record?

That is, we don't want to bust the entire feed -- just reliably mark and remove those records having non-compliant contents.

Thanks!

Submitted by support on Mon, 2010-07-05 08:10

Hello Bud,

Unfortunately it's not that straight forward; as to be able to detect invalid characters you effectively have to parse the file. If you're having trouble with the XML libraries on your server being intolerant of XML encoding errors perhaps if you let me know the situation one of the cleansing versions of Magic Parser will help...

All the best,
David.

Submitted by Bud Hovell on Wed, 2010-07-21 02:52

Hi, David ...

Yeah, we get people inputting word-processor trash, which promptly breaks the parser. If these characters could be replaced rather than have the file rejected, that would surely be preferable. Have you a version that does that?

Submitted by support on Thu, 2010-07-22 08:12

Hello Bud,

Sure - i'll email you the cleansing versions of MagicParser.php to try..

Cheers,
David.