You are here:  » Problem parsing chefmoz rdf file


Problem parsing chefmoz rdf file

Submitted by hap on Thu, 2007-07-26 17:03 in

I can't seem to get through the whole file.

Here's the code I'm using:

<?php
  set_time_limit
(0);
  
error_reporting(E_ALL);
  
ini_set('display_errors''1');
  
$i=0;
  require(
"MagicParser.php");
  function 
myRecordHandler($record)
  {
    print 
$record["LOCATION"] . "\n";
    print 
$record["D:TITLE"] . "\n";
    
print_r($record);
    global 
$i;
    
$i++;
  }
  
// http://www.restaurants.vc/chefmoz/chefmoz.rest.rdf
// gets through 583 RESTAURANT records
//  $result = MagicParser_parse("../chefmoz.rest.rdf","myRecordHandler");
// gets through 583 RESTAURANT records
//  $result = MagicParser_parse("../chefmoz.rest.rdf","myRecordHandler","xml|RDF/Restaurant/");
// returns 0 records
//  $result = MagicParser_parse("../chefmoz.rest.rdf","myRecordHandler","xml|rdf:RDF/Restaurant/");
// returns 0 records
  
$result MagicParser_parse("../chefmoz.rest.rdf","myRecordHandler","xml|RDF:RDF/Restaurant/");
  if (!
$result)
  {
    print 
MagicParser_getErrorMessage();
  }
  echo 
$i;
?>

Submitted by support on Thu, 2007-07-26 17:08

Hi,

I'll have to take a look at the XML for you to see what's going on here. Can you email me a link from where I can download your chefmoz.rest.rdf file? You can reply to your reg code or forum registration email is the easiest way to get me. If you could also tell me in the email how many records your expect to be in the file, i'll then study the XML and run it against your code to see why you are not getting all the records.

Cheers,
David.

Submitted by hap on Thu, 2007-07-26 20:42

The URL to the data is:

{LINK SAVED}

Submitted by support on Thu, 2007-07-26 20:44

Thanks - i've taken a copy of the link and i'll take a look for you.

Cheers,
David.

Submitted by hap on Thu, 2007-07-26 20:47

I'm not sure how many records there are, but it may be 100,000. I just gets through the first county. I just need California.

Submitted by support on Fri, 2007-07-27 06:14

Hi,

I've been able to study the XML, and the reason it stops after 583 records is because the 584th Restaurant record is not correctly formatted, containing characters that are not entity encoded. This breaks the XML at this point, so the file can no longer be parsed from that point on.

The field that breaks the XML is as follows (line breaks added for clarity):

<Link r:resource="http://www.citysearch.com.au/servlet/Satellite?action=viewContent&
actionParams=referrer&actionValues=searchResults&c=Page&cid=1119945819685&
city=sydney&cityName=Sydney&content=SContent&contentid=1137394259335
&pageid=1119945819685"/>

...and more specifically the & character which is not entity encoded.

If you are able to contact the feed provider, you should let them know that the LINK element within the Restaurant element is not entity encoded, and the file is therefore not valid XML.

When processing this file, the correct Magic Parser format string to use is:

xml|RDF/RESTAURANT/

...so your call to Magic Parser line should read:

$result = MagicParser_parse("../chefmoz.rest.rdf","myRecordHandler","xml|RDF/RESTAURANT/");

At the moment this will only read the first 583 records as discussed, but once the XML is corrected you will be able to access all records. I will carry on taking a look at this incase there is an easy work-around that you could use in the mean time, but ultimately it is worth contacting the feed provider in the hope that the XML can be made valid.

Cheers,
David.