Hello!
I am getting this error parsing an xml file
XML Parsing Error: not well-formed
Location: http://localhost/magicparser/test.php
Line Number 16, Column 63: [BUY_LINK] => http://www.avantlink.com/click.php?p=2295&pw=3211&pt=3&pri=2594&tt=df&url=http%3A%2F%2Fwww.paragonsports.com%2FParagon%2FShop%3FDSP%3D40000%26PCR%3D1%3A100%3A1001%3A10030%3A101028%26IID%3D1055-16484109%26campaign%3Davantlink
--------------------------------------------------------------^
the arrow points to the '=' in pw=3211
I am trying to create an rss feed out of this. A sample of the original xml is
<?xml version="1.0" encoding="us-ascii"?>
<Products>
<Product>
<SKU>5258-40407</SKU>
<Manufacturer_Id>5258</Manufacturer_Id>
<Brand_Name>Eagle Creek</Brand_Name>
<Product_Name>Eagle Creek 5-Piece Adapter Set</Product_Name>
<Long_Description></Long_Description>
<Short_Description>Ungrounded adapter plug for 2 pinned appliances such as hair dryers, phone chargers, etc..., and voltage converters.</Short_Description>
<Category>Gear</Category>
<SubCategory>Travel/Luggage</SubCategory>
<Product_Group>Accessories</Product_Group>
<Thumb_URL>http://www.paragonsports.com/Paragon/images/small/5258-40407_charcoal</Thumb_URL>
<Image_URL>http://www.paragonsports.com/Paragon/images/medium/5258-40407_charcoal_pd.jpg</Image_URL>
<Buy_Link>http://www.avantlink.com/click.php?p=2295&pw=3211&pt=3&pri=8764&tt=df&url=http%3A%2F%2Fwww.paragonsports.com%2FParagon%2FShop%3FDSP%3D40000%26PCR%3D1%3A101%3A1052%3A10405%26IID%3D5258-40407%26campaign%3Davantlink</Buy_Link>
<Keywords></Keywords>
<Reviews></Reviews>
<Retail_Price>12.50</Retail_Price>
<Sale_Price>12.50</Sale_Price>
<Brand_Page_Link>http://www.avantlink.com/click.php?p=2295&pw=3211&pt=3&pri=8764&tt=df&url=http%3A%2F%2Fwww.paragonsports.com%2FParagon%2FShop%3FDSP%3D16000%26brandspage%3DEagle+Creek%26brand%3DEagle+Creek</Brand_Page_Link>
<Brand_Logo_Image></Brand_Logo_Image>
<Product_Page_View_Tracking><img src="http://tracking.avantlink.com/dfpv.php?p=2295&pri=8764" width=0 height=0></Product_Page_View_Tracking>
</Product>
</Products>
If I replace the &
with &
then it works fine, but that is strange as I am reading that it is supposed to be the other way around. Can you please help?
David, thanks for such a prompt response!
I am new to rss formatting so my that might be the culprit,I'd appreciate your opinion! My code is:
<?php
require("MagicParser.php");
// set the output content-type to text/xml
header("Content-Type: text/xml");
// print the RSS header
print "<rss version='2.0'>";
print "<channel>";
print "<title>Some Title</title>";
function myRecordHandler($record)
{
print "<item>";
print "<title>".$record["SKU"]."</title>";
print "</item>";
}
MagicParser_parse("test_paragon.xml","myRecordHandler","xml|PRODUCTS/PRODUCT/");
print "</channel>";
print "</rss>";
?>
If I just print the SKU, it works well, because it does not contain any &
If I print an other field, say print "".$record["BUY_LINK"]."";
I get the "XML Parsing Error: not well-formed" error.
Hi Chris,
The way I usually do this is with an xmlentities() function that works in the same way as
PHP's htmlentities() function, but makes the string safe for inclusion in XML instead of
HTML. Have a go with the following version of your test script:
<?php
require("MagicParser.php");
// set the output content-type to text/xml
header("Content-Type: text/xml");
// function to make strings safe for outputting in XML
function xmlentities($text)
{
$search = array('&','<','>','"','\'');
$replace = array('&','<','>','"',''');
$text = str_replace($search,$replace,$text);
return $text;
}
// print the RSS header
print "<rss version='2.0'>";
print "<channel>";
print "<title>Some Title</title>";
function myRecordHandler($record)
{
print "<item>";
print "<title>".xmlentities($record["SKU"])."</title>";
print "<link>".xmlentities($record["BUY_LINK"])."</link>";
print "</item>";
}
MagicParser_parse("test_paragon.xml","myRecordHandler","xml|PRODUCTS/PRODUCT/");
print "</channel>";
print "</rss>";
?>
Hope this helps!
Cheers,
David.
David, this is great! It works perfectly! Thank you so much for your support, and for your excellent product!
Hello Christopher,
As the source XML looks fine (and parses correctly), can you post the section
of code that you are using to generate this particular line of your RSS feed?
As you say, & is the correct way to encode the ampersand within the output
that you are generating (exactly as it is within the original XML) so it should
work fine.
An alternative to entity encoding is to use CDATA tags, so in your output you
could generate something like this:
<buy_link><![CDATA[http://www.example.com/script.asp?foo=123&bar=456]]></buy_link>
In this case - & does not need to be encoded because any characters are valid between
the CDATA tags...
Cheers,
David.