You are here:  » problem with special characters


problem with special characters

Submitted by hoeksms on Thu, 2008-03-27 13:54 in

Hi,

I have a problem with some special characters.

My XML contains:

<FullDescription><![CDATA[Content with ‘quotes’ and – dashes]]></FullDescription>

I'm testing with a parser function that now looks like this ...

function processXML($order) {
$fullDescription = $order["FULLDESCRIPTION"];
echo $fullDescription . '<br>';
echo 'Content with ‘quotes’ and – dashes';
}

When I parse it and send it to the screen, I get ...

Content with ?quotes? and ? dashes
Content with ‘quotes’ and – dashes

The second line shows me that it's not just a problem with displaying it in the HTML.
It seems to me that the parser cannot handle the special characters and produces the question marks.

Please advise.

Kind regards,

Sybrand Hoeksma
The Netherlands

Submitted by support on Thu, 2008-03-27 13:57

Hello Sybrand,

This happens when the characters you are outputting (from the feed) do not match the character set that the web browser is displaying the page in.

To fix this, you need to control the character set by sending a content-type header. To do this, at the top of your script (it must come before ANY output has been generated), add:

  header("Content-Type: text/html;charset=utf-8");

utf-8 is the most likely, but if that doesn't work try:

  header("Content-Type: text/html;charset=iso-8859-1");

That should fix it!
Cheers,
David.

Submitted by hoeksms on Thu, 2008-03-27 15:00

I'm afraid it's not so simple. I'm now using a test script that looks like this

header("Content-Type: text/html;charset=utf-8");
require("MagicParser.php");
function processXML($order) {
$fullDescription = $order["FULLDESCRIPTION"];
echo $order["FULLDESCRIPTION"] . '<br>';
}
MagicParser_parse('/export/www/ebooks/test/test.xml',processXML,"xml|INVENTORY/CONTENT/");

With this XML code:

<?xml version="1.0" encoding="UTF-16"?>
<Inventory>
<Content>
<FullDescription>Content with ‘quotes’ and – dashes</FullDescription>
</Content>
</Inventory>

You can see the result for yourself ...
http://ebooks.ndcvbk.nl/test/

I've also tested the XML in you online demo and there it works fine.
Any chance that I'm using an old version of MagicParser?

Sybrand

Submitted by support on Thu, 2008-03-27 15:03

Hi,

I notice that your XML declares utf-16 - have you tried this in the header line of your script:

header("Content-Type: text/html;charset=utf-16");

If that's not different, could you perhaps email me a copy of test.xml and i'll check it out for you - MagicParser hasn't been updated at all in any way that would affect this so no problem there! Use the email address on this page to get me...!

Cheers,
David.

Submitted by hoeksms on Fri, 2008-03-28 08:42

A header with utf-16 only makes it worse.

I also tested the test.XML on you demo page (option 2) by submitting the full URL to my server(http://ebooks.ndcvbk.nl/test/test.xml). Your demo page uses utf-8 and the output looks fine (after I changed the format string).

I've just emailed you the xml and my script.

Hope you can help.

Kind regards,

Sybrand

Submitted by pl_harish on Sat, 2008-05-17 01:39

Having a similar problem.
Appreciate help with a resolution for this.

What I am using:
> I have a feed file in csv format, with header row and a few columns of data which could have special chars in it.
> I run a cron job php, which uses magicparser to parse the feeds and load to database after some processing.
> I get only Nubuck~Caf?? for a string Nubuck~CafòÍ (hope you can see this difference in this forum message)
> I would like to receive the characters as-is if possible and do my own utf-decoding with php... or, if that is not possible, it would be good if magic parser can parse it properly.

Any suggestions.

Thanks,
Harish

Submitted by support on Sat, 2008-05-17 08:35

Hello Harish,

Magic Parser will always return exactly the characters that are in the feed - any problems to do with special characters are almost always related to how the data is subsequently used / displayed.

The question mark character "?" in place of special characters is normally the way in which web browsers display characters that are not in the declared character set of the data; so in this case if you are viewing the data through a web browser make sure that you are sending the correct header, e.g.:

  header("Content-Type: text/html; charset=utf-8");

~or~

  header("Content-Type: text/html; charset=iso-8859-1");

However, in this case, you mention that you are importing into a database. The next thing to check is that the database (and field) character set is correct; as this could also cause the characters to be imported incorrectly.

If you're still not sure where problem lies, if you could post a little more of the PHP you have written that will help. If you don't want to post it to the forum, feel free to email me anything you would like me to have a look at for you - reply to your registration code or forum registration email is the easiest way to get me...

Cheers,
David.