You are here:  » Doesn't parse part of the XML


Doesn't parse part of the XML

Submitted by champ on Sat, 2008-06-14 22:03 in

Hi,
When I am trying to parse a feed by using MagicParser, I am getting to the point that part of it getting lost.

This is the feed that I am trying to parse:

<QueryResponse xmlns="">
   <categoryResponse matched="2" included="1">
      <category id="499" parent_id="4">
         <name>MP3 &amp; Media Players</name>
         <relevance>0.999999463558197</relevance>
         <URL>http://something.com</URL>
      </category>
      <category id="46015" parent_id="4">
         <name>MP3 Player Accessories</name>
         <relevance>5.40855637609639e-07</relevance>
         <URL>http://something.com</URL>
      </category>
    </categoryResponse>
   <productResponse requested="20" matched="2000" included="20" start="1">
      <product category_id="499" id="626957925">
         <name>Apple iPod Classic 80 GB - Black</name>
         <relevance>5289607168000</relevance>
         <URL>http://something.com/ipod.html</URL>
         <imageURL_small>http://image.something.com/resize?sq=60&amp;uid=626957925</imageURL_small>
         <imageURL_med>http://image.something.com/resize?sq=100&amp;uid=626957925</imageURL_med>
         <imageURL_medlarge>http://image.something.com/resize?sq=160&amp;uid=626957925</imageURL_medlarge>
         <imageURL_large>http://image.something.com/resize?sq=400&amp;uid=626957925</imageURL_large>
         <maxRawImageSize/>
         <desc_short>Holds Up to 20,000 Songs, 25,000 Photos, or 100 hrs of Video - 2.5 in Display - Battery Life: Up to 30 hrs of Audio/7 hrs of Video</desc_short>
         <desc_long>Holds Up to 20,000 Songs, 25,000 Photos, or 100 hrs of Video - 2.5 in Display - Battery Life: Up to 30 hrs of Audio/7 hrs of Video</desc_long>
         <prodRating URLref="prodRate-4">4</prodRating>
         <prodScore>4.00</prodScore>
         <numReviews>18</numReviews>
         <reviewsURL>http://www.something.com/mp3_mediaplayers/apple-ipod-classic-80-gb-black--pid626957925/reviews__af_assettype_id--10__af_creative_id--6__af_id--1001__af_placement_id--1__keyword--ipods__rf--af1.html</reviewsURL>
         <minPrice>175.99</minPrice>
         <maxPrice>249.99</maxPrice>
         <numMerchants>6</numMerchants>
      </product>
      <product category_id="499" id="626986930">
         <name>Apple iPod Nano 4 GB - Silver</name>
         <relevance>4019571654656</relevance>
         <URL>http://www.something.com/mp3_mediaplayers/apple-ipod-nano-4-gb-silver--pid626986930/compareprices__af_assettype_id--10__af_creative_id--6__af_id--1001__af_placement_id--1__keyword--ipods__rf--af1.html</URL>
         <imageURL_small>http://image.something.com/resize?sq=60&amp;uid=626986930</imageURL_small>
         <imageURL_med>http://image.something.com/resize?sq=100&amp;uid=626986930</imageURL_med>
         <imageURL_medlarge>http://image.something.com/resize?sq=160&amp;uid=626986930</imageURL_medlarge>
         <imageURL_large>http://image.something.com/resize?sq=400&amp;uid=626986930</imageURL_large>
         <maxRawImageSize/>
         <desc_short>Holds Up to 1,000 Songs, 3,500 Photos, or 4 hrs of Video - 2 in. Display - Battery Life: Up to 24 hrs of Audio/5 hrs of Video</desc_short>
         <desc_long>Holds Up to 1,000 Songs, 3,500 Photos, or 4 hrs of Video - 2 in. Display - Battery Life: Up to 24 hrs of Audio/5 hrs of Video</desc_long>
         <prodRating URLref="prodRate-4.5">4.5</prodRating>
         <prodScore>0.00</prodScore>
         <numReviews>12</numReviews>
         <reviewsURL>http://www.something.com/mp3_mediaplayers/apple-ipod-nano-4-gb-silver--pid626986930/reviews__af_assettype_id--10__af_creative_id--6__af_id--1001__af_placement_id--1__keyword--ipods__rf--af1.html</reviewsURL>
         <minPrice>104.99</minPrice>
         <maxPrice>149.99</maxPrice>
         <numMerchants>4</numMerchants>
      </product>
   </productResponse>
   <otherResponse>
      <totalProducts>22899973</totalProducts>
      <totalStores>109024</totalStores>
      <imageURL>
         <URL id="prodRate-3.5">http://image.something.com/site/rating_3_and_half_star_80x13.gif</URL>
         <URL id="prodRate-4">http://image.something.com/site/rating_4_star_80x13.gif</URL>
         <URL id="prodRate-4.5">http://image.something.com/site/rating_4_and_half_star_80x13.gif</URL>
         <URL id="prodRate-5">http://image.something.com/site/rating_5_star_80x13.gif</URL>
      </imageURL>
      <otherURL>
         <URL id="search">http://www.something.com/search__keyword--ipods__rf--af1.html</URL>
      </otherURL>
  <trackingPixel>http://adserve.something.com/img/publisherID-1001/assetID-6/placementID-1/</trackingPixel>
  </otherResponse>
</QueryResponse>

This is the code which is generated by http://www.magicparser.com/demo for php:

<?php
  require("MagicParser.php");
  function myRecordHandler($record)
  {
    print $record["PRODUCT"];
    print $record["PRODUCT-CATEGORY_ID"];
    print $record["PRODUCT-ID"];
    print $record["NAME"];
    print $record["RELEVANCE"];
    print $record["URL"];
    print $record["IMAGEURL_SMALL"];
    print $record["IMAGEURL_MED"];
    print $record["IMAGEURL_MEDLARGE"];
    print $record["IMAGEURL_LARGE"];
    print $record["MAXRAWIMAGESIZE"];
    print $record["DESC_SHORT"];
    print $record["DESC_LONG"];
    print $record["PRODRATING"];
    print $record["PRODRATING-URLREF"];
    print $record["PRODSCORE"];
    print $record["NUMREVIEWS"];
    print $record["REVIEWSURL"];
    print $record["MINPRICE"];
    print $record["MAXPRICE"];
    print $record["NUMMERCHANTS"];
  }
  MagicParser_parse("all good here.");
?>

As you can see only productResponse has been parsed.
categoryResponse and otherResponse is just ignored.

Could you help me to get a solution for this issue, please?

Thank you!

Submitted by support on Sun, 2008-06-15 08:28

Hi,

This will have happened because Magic Parser has auto-detected the product records, and is returning them
to your myRecordHandler function.

In order to access the category records, you would need to parse using a different format string, and its
own record handler function. Here's what you need to do:

<?php
  header
("Content-Type: text/plain");
  require(
"MagicParser.php");
  function 
myCategoryRecordHandler($record)
  {
    
print_r($record);
  }
  function 
myProductRecordHandler($record)
  {
    
print_r($record);
  }
  function 
myOtherRecordHandler($record)
  {
    
print_r($record);
  }
  
$xml "_YOUR_XML_HERE_"// string variable contianing the data to parse
  
MagicParser_parse("string://".$xml,"myProductRecordHandler","xml|QUERYRESPONSE/PRODUCTRESPONSE/PRODUCT/");
  
MagicParser_parse("string://".$xml,"myCategoryRecordHandler","xml|QUERYRESPONSE/CATEGORYRESPONSE/CATEGORY/");
  
MagicParser_parse("string://".$xml,"myOtherRecordHandler","xml|QUERYRESPONSE/OTHERRESPONSE/");
?>

Note that I have added header("Content-Type: text/plain"); at the top of this demo script - that is so that
you can easily see the values in each of the $record arrays. Simply remove this once you start to make the
script generate the HTML output that you require (if that is what you are doing).

If you are parsing a URL; it's best not to keep using a URL in the call to MagicParser_parse as each one will
result in a request being made to the remote server. To do this, instead of:

  $xml = "_YOUR_XML_HERE_"; // string variable contianing the data to parse

Use something like:

<?php
  $xml 
"";
  
$url "http://www.example.com/path/to/xml";
  if (
$fp fopen($url,"r"))
  {
    while(!
feof($fp)) $xml .= fread($fp,1024);
    
fclose($fp);
  }
?>

Hope this helps!
Cheers,
David.

Submitted by champ on Sat, 2008-06-21 06:21

Hi David,
Thank you very much for quick and useful response. It worked perfectly!

Submitted by champ on Mon, 2008-06-23 21:29

Hi David,
I have another slight problem.
I need to get 2 parameters from product response. I need to get number of products that "matched" and number that "included".

<productResponse requested="20" matched="2000" included="20" start="1">

Thanks in advance :)

Submitted by support on Tue, 2008-06-24 08:12

Hi,

Whilst you can access this data using Magic Parser, it is not ideal because the script is
designed for accessing repeating records, not specific elements or attributes of a large
XML document.

What you need to do is parse the document at the top level element, using (in this case)
the format string:

xml|QUERYRESPONSE/

As the entire document will then be passed to your record handler function, you can access
the number of products matched and included through the following variables:

$record["PRODUCTRESPONSE-MATCHED"]
$record["PRODUCTRESPONSE-INCLUDED"]

For example (based on the example above)

<?php
  
function myTopLevelRecordHandler($record)
  {
    
// you probably just want to copy these into global variables
    
print $record["PRODUCTRESPONSE-MATCHED"];
    print 
$record["PRODUCTRESPONSE-INCLUDED"];
  }
  
$xml "_YOUR_XML_HERE_"// string variable contianing the data to parse
  
MagicParser_parse("string://".$xml,"myTopLevelRecordHandler","xml|QUERYRESPONSE/");
?>

Hope this helps!
Cheers,
David.

Submitted by champ on Fri, 2008-06-27 03:05

Hi David,
Thanks for reply, it did work, but it doesn't work consistently. :(
May be I did something wrong.

    function top($record)
  {
  echo $record["PRODUCTRESPONSE-INCLUDED"];
 }

INCLUDED - is total number of products.
One time for example it will display 139 products after refresh it is displaying total of 217 products that were included. Same thing happens when I am going to another page. It happens about 50% of the times.

Please help.

Submitted by support on Fri, 2008-06-27 11:02

Hi,

Could you perhaps look at the XML a few times to see if it is actually the value in the feed that is different? As there is a number being displayed, it is almost certain that it is exactly the value contained in the XML (otherwise it would be empty), so we really need to eliminate the XML source first...

Cheers,
David.

Submitted by champ on Wed, 2008-07-09 20:31

David,
another small issue:

This is part of xml file

<attrResponse requested="0" included="1">
 <attr id="259818">
   <name>Price Range</name>
   <URL>
   </URL>
   <values requested="" included="7">
     <attrValue id="030630507324">
        <name>< 450</name>
        <numProductsMatched>791</numProductsMatched>
        <URL></URL>
      </attrValue>
      <attrValue id="030594261506502705">
         <name>450 - 620</name>
         <numProductsMatched>785</numProductsMatched>
         <URL></URL>
       </attrValue>
......

This is PHP code that I use to pull the data:

$fname = array();
function filter_name($record)
{
  global $fname;
  $fname[] = $record;
}
$brandID = array();
function filterID($record)
{
  global $brandID;
  $brandID[] = $record;
}
$brand_filter = array();
function filter($record)
{
  global $brand_filter;
  $brand_filter[] = $record;
}
........
........
........
MagicParser_parse("string://".$xml,"filter_name","xml|QUERYRESPONSE/ATTRRESPONSE/ATTR/");
MagicParser_parse("string://".$xml,"filterID","xml|QUERYRESPONSE/ATTRRESPONSE/ATTR/VALUES/");
MagicParser_parse("string://".$xml,"filter","xml|QUERYRESPONSE/ATTRRESPONSE/ATTR/VALUES/ATTRVALUE/");

Here is few issues that I have.
1) For some reason filterID array does not fully displayed. Only 1st record is shown. When I am trying to pull other parameters it showing that no other data is in the array.
2) Is there another way to write a code, it gets kind of messy when I need to create array for each parameter?

Submitted by support on Thu, 2008-07-10 11:31

Hi,

For the format string you are using for the filterID record handler...

xml|QUERYRESPONSE/ATTRRESPONSE/ATTR/VALUES/

...there will only be one record, and it will contain all the ATTRVALUE records
using the @1,@2... notation for differentiating duplicate fields.

What values are you trying to extract into the filterID array (in other words
how do you want to use it later on in the code, that will help me see what code
you need to use to populate it)

With regards to the coding style, as this is not record based XML this is really
the only suitable way to handle it using Magic Parser; as ordinarily you would use
a DOM parse with XML of this style (which is much more complicated!)....

Cheers,
David.