You are here:  » Can't seem to parse xml feed (even using your demo)


Can't seem to parse xml feed (even using your demo)

Submitted by jimpannell on Sun, 2009-12-06 22:15 in

Hi David

I'm trying to parse the xml in the following url but am having problems: http://my-url.com/file.xml

The format string should be: xml|PROPERTIES/PROPERTY/ but it isn't showing any results (I know there are loads in there).

Any idea what might be wrong?

Kind regards

Jim

Submitted by support on Mon, 2009-12-07 09:00

Hello Jim,

What's happened here is that an <agent> element appears at the top of the output ahead of the <properties> but without any all encompassing document element, so unfortunately the XML is not valid. In fact, if you browse to the XML URL directly a parsing error should be displayed - Firefox displays "XML Parsing Error: junk after document element .... Line Number 10, Column 1:".

Sometimes it's relatively easy to fix-up the XML using string functions and that should be the case here. Try something like this, which will display the first record and exit:

<?php
  
require("MagicParser.php");
  function 
myRecordHandler($record)
  {
    
print_r($record);
    return 
TRUE;
  }
  
$xml file_get_contents("http://burgundy4u.com/content/root/files/my-french-house.xml");
  
$xml substr($xml,strpos($xml,"<properties>"));
  
MagicParser_parse("string://".$xml,"myRecordHandler","xml|PROPERTIES/PROPERTY/");
?>

Hope this helps!
Cheers,
David.

Submitted by jimpannell on Mon, 2009-12-07 09:39

Oh dear... I actually supplied the xml format to the feed author myself.

Would this be better then?

<?xml version="1.0" encoding="UTF-8"?>
<document>
<agent>
  <agentName>agent info here</agentName>
</agent>
<properties>
  <property> details </property>
  <property> details </property>
  <property> details </property>
</properties>
</document>

Submitted by support on Mon, 2009-12-07 09:41

Hi Jim,

Yes - that should be fine, and then you would use the Format String:

xml|DOCUMENT/PROPERTIES/PROPERTY/

Cheers,
David.

Submitted by jimpannell on Mon, 2009-12-07 10:14

Many thanks, as always.

Submitted by jimpannell on Mon, 2009-12-07 10:55

Hi David

For some reason I can only get MagicParser to parse 12 items in the feed. The test code I'm using is as follows:

<?php
  require("MagicParser.php");
$counter = 0;
  function myRecordHandler($record)
  {
global $counter;
  print "$counter<br />";
print $record["REFERENCE"]."<br />";
    print $record["TITLE"]."<br /><br />";
$counter++;
// if ($counter == 50) return TRUE;
  }
  $xml = file_get_contents("http://burgundy4u.com/content/root/files/my-french-house.xml");
  $xml = substr($xml,strpos($xml,"<properties>"));
  MagicParser_parse("string://".$xml,"myRecordHandler","xml|PROPERTIES/PROPERTY/");
?>

Any ideas why this might be happening?

Cheers

Jim

Submitted by support on Mon, 2009-12-07 11:05

Hi Jim,

It's formatting error in the 13th record:

<caption>Garden & Views</caption>

& within XML must be entity encoded as:

<caption>Garden &amp; Views</caption>

In general, when constructing XML, any field that is not enclosed in CDATA tags should be checked for any characters that should be encoded by one of the 5 pre-defined XML entities. These are:

& - &amp;
" - &quot;
' - &apos;
< - &lt;
> - &gt;

A simple PHP function to sanitise a string is as follows:

function xmlentities($text)
{
  $search = array("&","\"","'","<",">");
  $replace = array("&amp;","&quot;","&apos;","&lt;","&gt;");
  return str_replace($search,$replace,$text);
}

...and then to use in construction, for example:

  print "<caption>".xmlentities($caption)."</caption>";

Hope this helps!
Cheers,
David.

Submitted by jimpannell on Mon, 2009-12-07 11:08

Mmm... is one of the fields that I'm not even processing as it isn't needed currently.

Is there a way to get MagicParser to ignore certain fields?

Submitted by support on Mon, 2009-12-07 11:14

Hi Jim,

Unfortunately the parse is being aborted by the lower level PHP XML parser, not at the Magic Parser level; but what can be done is to cleanse that field before parsing. Assuming that an un-encoded & is always followed by a space; then the following could be used, immediately before your call to MagicParser_parse()....

  $xml = str_replace("& ","&amp; ",$xml);

Cheers,
David.

Submitted by jimpannell on Mon, 2009-12-07 11:18

Yep - that did it. Many, many thanks!