Cannot parse special characters (symbols) like Trademark, Copyright and Registered

Submitted by bayuobie on Tue, 2011-06-28 21:46 in Magic Parser

I have used the magic parser to read xml files but it just failed to output anything because one of my files contained special characters. Below is a sample xlm item with the special characters.

<product>

  <id>1591</id>

    <description><![CDATA[

MODELLO PRODOTTO:         MONITOR ASUS LCD 23` MS236H FULL HD

                          Power cord

                          Power adapter

                          Quick start guide

                          HDMI-to-DVI cable

                          warranty card

Regulation Approval       Energy Star®, UL/cUL, CB, CE, FCC, CCC, BSMI, Gost-R, C-Tick, MEPS, VCCI, PSE, J-MOSS,

                          PSB,China Energy Label Level 1, RoHS, WEEE, Windows Vista WHQL]]></description>

  </product>

Hello Bayoubie, I notice

Submitted by support on Wed, 2011-06-29 08:13

Hello Bayoubie,

I notice that the data is correctly delimited using CDATA tags, so it is almost certainly down to a character encoding error, or less likely a mis-match between the declared encoding of the document, and the actual data itself.

To work around this I have cleansing versions of the script that I will send to you to use in place of the standard version, please check your email to the same address that you registered on the forum with;

Cheers,
David
--
MagicParser.com

Hey David, The cleansing

Submitted by bayuobie on Wed, 2011-07-06 14:02

Hey David,

The cleansing version worked well for me but I wanted to add that the UTF-8 version was not working for me because the xml source file had it's header-content type set to ISO-88591. So I just changed to the cleansing version for ISO-88591.

Thanks.