You are here:  » XML Parsing Error: not well-formed


XML Parsing Error: not well-formed

Submitted by christopher skauss on Fri, 2008-03-21 10:43 in

Hello!
I am getting this error parsing an xml file

XML Parsing Error: not well-formed
Location: http://localhost/magicparser/test.php
Line Number 16, Column 63: [BUY_LINK] => http://www.avantlink.com/click.php?p=2295&pw=3211&pt=3&pri=2594&tt=df&url=http%3A%2F%2Fwww.paragonsports.com%2FParagon%2FShop%3FDSP%3D40000%26PCR%3D1%3A100%3A1001%3A10030%3A101028%26IID%3D1055-16484109%26campaign%3Davantlink
--------------------------------------------------------------^

the arrow points to the '=' in pw=3211
I am trying to create an rss feed out of this. A sample of the original xml is

<?xml version="1.0" encoding="us-ascii"?>
<Products>
<Product>
<SKU>5258-40407</SKU>
<Manufacturer_Id>5258</Manufacturer_Id>
<Brand_Name>Eagle Creek</Brand_Name>
<Product_Name>Eagle Creek 5-Piece Adapter Set</Product_Name>
<Long_Description></Long_Description>
<Short_Description>Ungrounded adapter plug for 2 pinned appliances such as hair dryers, phone chargers, etc..., and voltage converters.</Short_Description>
<Category>Gear</Category>
<SubCategory>Travel/Luggage</SubCategory>
<Product_Group>Accessories</Product_Group>
<Thumb_URL>http://www.paragonsports.com/Paragon/images/small/5258-40407_charcoal</Thumb_URL>
<Image_URL>http://www.paragonsports.com/Paragon/images/medium/5258-40407_charcoal_pd.jpg</Image_URL>
<Buy_Link>http://www.avantlink.com/click.php?p=2295&amp;pw=3211&amp;pt=3&amp;pri=8764&amp;tt=df&amp;url=http%3A%2F%2Fwww.paragonsports.com%2FParagon%2FShop%3FDSP%3D40000%26PCR%3D1%3A101%3A1052%3A10405%26IID%3D5258-40407%26campaign%3Davantlink</Buy_Link>
<Keywords></Keywords>
<Reviews></Reviews>
<Retail_Price>12.50</Retail_Price>
<Sale_Price>12.50</Sale_Price>
<Brand_Page_Link>http://www.avantlink.com/click.php?p=2295&amp;pw=3211&amp;pt=3&amp;pri=8764&amp;tt=df&amp;url=http%3A%2F%2Fwww.paragonsports.com%2FParagon%2FShop%3FDSP%3D16000%26brandspage%3DEagle+Creek%26brand%3DEagle+Creek</Brand_Page_Link>
<Brand_Logo_Image></Brand_Logo_Image>
<Product_Page_View_Tracking>&lt;img src=&quot;http://tracking.avantlink.com/dfpv.php?p=2295&amp;pri=8764&quot; width=0 height=0&gt;</Product_Page_View_Tracking>
</Product>
</Products>

If I replace the &amp; with & then it works fine, but that is strange as I am reading that it is supposed to be the other way around. Can you please help?

Submitted by support on Fri, 2008-03-21 10:59

Hello Christopher,

As the source XML looks fine (and parses correctly), can you post the section
of code that you are using to generate this particular line of your RSS feed?

As you say, & is the correct way to encode the ampersand within the output
that you are generating (exactly as it is within the original XML) so it should
work fine.

An alternative to entity encoding is to use CDATA tags, so in your output you
could generate something like this:

<buy_link><![CDATA[http://www.example.com/script.asp?foo=123&bar=456]]></buy_link>

In this case - & does not need to be encoded because any characters are valid between
the CDATA tags...

Cheers,
David.

Submitted by christopher skauss on Fri, 2008-03-21 11:39

David, thanks for such a prompt response!
I am new to rss formatting so my that might be the culprit,I'd appreciate your opinion! My code is:

<?php
  
require("MagicParser.php");
    
// set the output content-type to text/xml
    
header("Content-Type: text/xml");
    
// print the RSS header
    
print "<rss version='2.0'>";
    print 
"<channel>";
    print 
"<title>Some Title</title>";
  function 
myRecordHandler($record)
  {
    print 
"<item>";
    print 
"<title>".$record["SKU"]."</title>";
    print 
"</item>";
  }
  
MagicParser_parse("test_paragon.xml","myRecordHandler","xml|PRODUCTS/PRODUCT/");
  print 
"</channel>";
  print 
"</rss>";
?>

If I just print the SKU, it works well, because it does not contain any &
If I print an other field, say print "".$record["BUY_LINK"]."";
I get the "XML Parsing Error: not well-formed" error.

Submitted by support on Fri, 2008-03-21 11:47

Hi Chris,

The way I usually do this is with an xmlentities() function that works in the same way as
PHP's htmlentities() function, but makes the string safe for inclusion in XML instead of
HTML. Have a go with the following version of your test script:

<?php
  
require("MagicParser.php");
  
// set the output content-type to text/xml
  
header("Content-Type: text/xml");
  
// function to make strings safe for outputting in XML
  
function xmlentities($text)
  {
    
$search = array('&','<','>','"','\'');
    
$replace = array('&amp;','&lt;','&gt;','&quot;','&apos;');
    
$text str_replace($search,$replace,$text);
    return 
$text;
  }
  
// print the RSS header
  
print "<rss version='2.0'>";
  print 
"<channel>";
  print 
"<title>Some Title</title>";
  function 
myRecordHandler($record)
  {
    print 
"<item>";
    print 
"<title>".xmlentities($record["SKU"])."</title>";
    print 
"<link>".xmlentities($record["BUY_LINK"])."</link>";
    print 
"</item>";
  }
  
MagicParser_parse("test_paragon.xml","myRecordHandler","xml|PRODUCTS/PRODUCT/");
  print 
"</channel>";
  print 
"</rss>";
?>

Hope this helps!
Cheers,
David.

Submitted by christopher skauss on Fri, 2008-03-21 12:18

David, this is great! It works perfectly! Thank you so much for your support, and for your excellent product!