You are here:  » Extracting XML Meta data


Extracting XML Meta data

Submitted by 1earthling on Wed, 2006-09-13 13:02 in

Google returns useful information in the feed header before the records. Below is example data.
How would you retrieve the value for: <openSearch:totalResults>222010</openSearch:totalResults> ?
Knowing this value will allow you to keep track of the number of requests needed to get all the data from the feed.

<feed>
<id>http://www.google.com/base/feeds/snippets</id>
<updated>2006-09-13T12:25:27.614Z</updated>
<title type="text">Items matching query: canon digital camera</title>
<link rel="alternate" type="text/html" href="http://base.google.com"/>
<link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://www.google.com/base/feeds/snippets"/>
<link rel="self" type="application/atom+xml" href="http://www.google.com/base/feeds/snippets?key=your_key_here&bq=canon+digital+camera"/>
<link rel="next" type="application/atom+xml" href="http://www.google.com/base/feeds/snippets?start-index=26&max-results=25&key=your_key_here&bq=canon+digital+camera"/>
<generator version="1.0" uri="http://base.google.com">GoogleBase</generator>
<openSearch:totalResults>222010</openSearch:totalResults>
<openSearch:startIndex>1</openSearch:startIndex>
<openSearch:itemsPerPage>25</openSearch:itemsPerPage>
<entry>
<id>
http://www.google.com/base/feeds/snippets/16337844538183614302
</id>
<published>2006-08-24T01:43:53.000Z</published>
<updated>2006-09-13T02:52:17.000Z</updated>
<category scheme="http://base.google.com/categories/itemtypes" term="products"/>
<title type="text">
FOR CANON DIGITAL CAMERA BATTERY EOS-1D MARK II {NP-E3}
</title>
<content type="html">
Everydaysource-Universal Accessories/DV Store FOR CANON DIGITAL CAMERA BATTERY EOS-1D MARK II {NP-E3} New Page 2 100% new high quality generic (non-OEM) NP-E3 Ni-MH Battery Quantity: 1 Never runs out of battery power when you're just about to capture the perfect moment! Time to get extra power for your digital video camera/camcorder.This is a high capacity, rechargeable Ni-MH 2200mAh battery with premium Japanese cell.Voltage: 12vColor: blackWeight: 12.6 oz.Note: best replacement for the
</content>
<link rel="alternate" type="text/html" href="http://adfarm.mediaplex.com/ad/ck/711-5256-8196-2?loc=http%3A%2F%2Fcgi.ebay.com%2Fws%2FeBayISAPI.dll%3FViewItem%26item%3D180018429244%26category%3D48516"/>
<link rel="self" type="application/atom+xml" href="http://www.google.com/base/feeds/snippets/16337844538183614302"/>
<author>
<name>eBay</name>
</author>
<g:label type="text">Products</g:label>
<g:expiration_date type="dateTime">2006-09-23T01:43:53.000Z</g:expiration_date>
<g:target_country type="text">US</g:target_country>
<g:item_type type="text">Products</g:item_type>
<g:customer_id type="int">11729</g:customer_id>
<g:id type="text">180018429244</g:id>
<g:product_review type="float">5.0</g:product_review>
<g:image_link type="url">http://thumbs.ebaystatic.com/pict/180018429244.jpg</g:image_link>
<g:product_num_reviews type="int">1</g:product_num_reviews>
<g:price type="floatUnit">13.97 usd</g:price>
<g:brand type="text">Digital Camera Battery</g:brand>
<g:item_language type="text">EN</g:item_language>
</entry>
more records here
</feed>

Submitted by support on Wed, 2006-09-13 13:18

Hi,

To extract meta data, you would first need to parse the feed using a different format string - one relating to the top level element. The meta data that you refer to in this example would then be present in the first and only record.

However, because of the way Magic Parser works, you will also have every other record resolved into a unique key name, so memory could be an issue if the feed has hundereds of records. If not, it's no problem.

For example (using gbase.xml as the file containing the above xml):

<?php
  
require("MagicParser.php");
  function 
myMetaDataHandler($record)
  {
    print 
"<p>Total Results: ".$record["OPENSEARCH:TOTALRESULTS"]."</p>";
  }
  function 
myRecordHandler($record)
  {
    
print_r($record);
  }
  
// extract meta data
  
MagicParser_parse("gbase.xml","myMetaDataHandler","xml|FEED/");
  
// extract records
  
MagicParser_parse("gbase.xml","myRecordHandler","xml|FEED/ENTRY/");
?>

That should do the trick.

Cheers,
David.