Hi David
I'm trying to parse the xml in the following url but am having problems: http://my-url.com/file.xml
The format string should be: xml|PROPERTIES/PROPERTY/ but it isn't showing any results (I know there are loads in there).
Any idea what might be wrong?
Kind regards
Jim
Oh dear... I actually supplied the xml format to the feed author myself.
Would this be better then?
<?xml version="1.0" encoding="UTF-8"?>
<document>
<agent>
<agentName>agent info here</agentName>
</agent>
<properties>
<property> details </property>
<property> details </property>
<property> details </property>
</properties>
</document>
Hi Jim,
Yes - that should be fine, and then you would use the Format String:
xml|DOCUMENT/PROPERTIES/PROPERTY/
Cheers,
David.
Hi David
For some reason I can only get MagicParser to parse 12 items in the feed. The test code I'm using is as follows:
<?php
require("MagicParser.php");
$counter = 0;
function myRecordHandler($record)
{
global $counter;
print "$counter<br />";
print $record["REFERENCE"]."<br />";
print $record["TITLE"]."<br /><br />";
$counter++;
// if ($counter == 50) return TRUE;
}
$xml = file_get_contents("http://burgundy4u.com/content/root/files/my-french-house.xml");
$xml = substr($xml,strpos($xml,"<properties>"));
MagicParser_parse("string://".$xml,"myRecordHandler","xml|PROPERTIES/PROPERTY/");
?>
Any ideas why this might be happening?
Cheers
Jim
Hi Jim,
It's formatting error in the 13th record:
<caption>Garden & Views</caption>
& within XML must be entity encoded as:
<caption>Garden & Views</caption>
In general, when constructing XML, any field that is not enclosed in CDATA tags should be checked for any characters that should be encoded by one of the 5 pre-defined XML entities. These are:
& - &
" - "
' - '
< - <
> - >
A simple PHP function to sanitise a string is as follows:
function xmlentities($text)
{
$search = array("&","\"","'","<",">");
$replace = array("&",""","'","<",">");
return str_replace($search,$replace,$text);
}
...and then to use in construction, for example:
print "<caption>".xmlentities($caption)."</caption>";
Hope this helps!
Cheers,
David.
Mmm... is one of the fields that I'm not even processing as it isn't needed currently.
Is there a way to get MagicParser to ignore certain fields?
Hi Jim,
Unfortunately the parse is being aborted by the lower level PHP XML parser, not at the Magic Parser level; but what can be done is to cleanse that field before parsing. Assuming that an un-encoded & is always followed by a space; then the following could be used, immediately before your call to MagicParser_parse()....
$xml = str_replace("& ","& ",$xml);
Cheers,
David.
Hello Jim,
What's happened here is that an <agent> element appears at the top of the output ahead of the <properties> but without any all encompassing document element, so unfortunately the XML is not valid. In fact, if you browse to the XML URL directly a parsing error should be displayed - Firefox displays "XML Parsing Error: junk after document element .... Line Number 10, Column 1:".
Sometimes it's relatively easy to fix-up the XML using string functions and that should be the case here. Try something like this, which will display the first record and exit:
<?php
require("MagicParser.php");
function myRecordHandler($record)
{
print_r($record);
return TRUE;
}
$xml = file_get_contents("http://burgundy4u.com/content/root/files/my-french-house.xml");
$xml = substr($xml,strpos($xml,"<properties>"));
MagicParser_parse("string://".$xml,"myRecordHandler","xml|PROPERTIES/PROPERTY/");
?>
Hope this helps!
Cheers,
David.