You are here:  » Newbie: Feed with multiple levels and sublevels


Newbie: Feed with multiple levels and sublevels

Submitted by tekneck on Thu, 2009-08-27 00:51 in

I have been studying the many posts on parsing xml into mysql databases but despite messing around with some cut 'n paste, I find myself still lost in code.

I have an xml feed that contains ads from many dealers. Each ad has all the necessary details but then breaks down into multiple image urls/details, and multiple options... not all ads have all the data present (ie: some don't have photos, others don't have many options listed).

1. What level should I be using to parse?
2. Do I need to loop through each level/record then photos then options or can the parser make sense of all this on a per-record basis?

I know how to insert, update etc in mysql but just don't get how to bring the two together, looping where necessary. The main ad details will go into one table, then I need to put the image urls/details into another table (one record per row, relating it using the adid) and the extras/options go into a third table.

Here is the sample url of a feed I am using for learning and testing:

MagicParser_parse("{link saved}","myRecordHandler","xml|IDF/DEALERS/DEALER/LISTINGS/LISTING/");

If I go up one level to xml|IDF/DEALERS/DEALER/LISTINGS/ I get all the listings and the sub details at once but don't imagine this where I want to be.

Anyway, any help you can provide to get me on track would be great since I will put it use on the real feeds and future feeds since they really don't get any more complicated than this one.

Thank you :)

Submitted by support on Thu, 2009-08-27 07:44

Hi,

Firstly,

xml|IDF/DEALERS/DEALER/LISTINGS/LISTING/

...is the correct Format String to be using in this case. This as you probably are already aware will call your myRecordHandler function for every LISTING record; and you can immediately access the main (non-repeating) fields through the variables such as

$record["ID"]
$record["URL"]
$record["STOCK_NUMBER"]

...so you would use these variables to generate an SQL statement to INSERT into your main ad details table.

Now, where you are creating a 1-to-many structure in your database; let's say you have a table for images and a table for features; the best structure to use in the case of this particular XML would be a foreach() loop, working through $record and inspecting each key for the type of field you are looking for, and then using the value if it matches the conditions.

As always, the easiest way to explain is by example; so consider the following code which will extract and create an INSERT SQL statement (may not be exactly correct as per your scenario of course) for every image and feature: (code to go within your myRecordHandler() function)

foreach($record as $key => $value)
{
  if (strpos($key,"IMGS")!==FALSE && strpos($key,"PATH")!==FALSE)
  {
    $sql = "INSERT INTO `images` SET
              ad_id='".mysql_escape_string($record["ID"])."',
              path='".mysql_escape_string($value)."'
              ";
     mysql_query($sql);
  }
  if (strpos($key,"FEATURES")!==FALSE && strpos($key,"TEXT")!==FALSE)
  {
    $sql = "INSERT INTO `features` SET
              ad_id='".mysql_escape_string($record["ID"])."',
              text='".mysql_escape_string($value)."'
              ";
     mysql_query($sql);
  }
}

Hope this helps!
Cheers,
David.

Submitted by tekneck on Sat, 2009-08-29 20:09

That makes sense. Now, I get the repeating records but how would I get the records one level up so I can know which 'dealer id' etc the listings relate to? The feed will have listings for many dealers so I need to be able to show their 'ID' and details as well as the individual listing's 'ID' (same tag name).

I have tried $record["DEALERS/DEALER/ID"] but know that this is not working. how do I walk up and down the structure?

Let me know if you need the sample xml again.

Thanks.... getting there :)

Submitted by support on Sun, 2009-08-30 08:20

Hi,

Yes please if you could let me know the URL again...

Hopefully if there is a listing ID field the process will be to parse the file twice, firstly at the DEALERS/DEALER level and build up a look-up table of listing ID > dealer ID, and then at the LISTING level where the dealer ID can be referenced from the lookup table.

Even if you are parsing a remote URL, there need not be any performance hit as the file can be read into a string using file_get_contents() and then the string parse using "string://"...

Let me know the URL again and I'll look into it for you...

Cheers,
David.

Submitted by tekneck on Sun, 2009-08-30 20:14

{link saved}

Thanks...

Submitted by tekneck on Mon, 2009-08-31 04:03

Here is my code (please don't post it unless you have to... it is kinda sloppy)

I already replied asking how I would show the dealer's id in each of these records (need this to relate the ad to the dealer).

I have added notes to the code and am having some success here.

1. I don't understand what the number in the brackets beside each while() statement means... and am not sure if I should be incrementing them.

2. I am trying to do a while inside another while to show the category name, then all the text that belongs in the category... not working for me... missing something...

{code saved}

Submitted by support on Mon, 2009-08-31 08:50

Hi,

Thanks for the code - i've copied it to file for future reference if required.

Firstly, regarding the requirement to capture the advertiser (dealer) ID from higher up the XML; the trick here is to parse the file twice, and in the first parse build up an array of advertiser IDs => dealer IDs so that you can look this up on the second parse when parsing at the LISTING level. Here's an outline example showing what I mean:

<?php
  
// the first parse could be very slow so set time limit to zero (unlimited)
  
set_time_limit(0);
  require(
"MagicParser.php");
  
// create an array ready to hold listing IDs => dealer IDs
  
$dealerID = array();
  
// first parse record handler function to build $delalerID array
  
function myDealerRecordHandler($record)
  {
    global 
$dealerID;
    
// same technique as described before - check every value in $record and
    // if it is an advertiser ID, use it and add to the $dealerID array
    
foreach($record as $key => $value)
    {
      if ((
strpos($key,"LISTINGS")!==FALSE) && (strpos($key,"/ID")!==FALSE))
      {
        
$dealerID[$value] = $record["ID"];
      }
    }
  }
  
// second parse record handler function, the bulk of your code from your previous
  // post will now go here (instead of myRecordHandler) and you can set, and then
  // use $fk_advertiser_id as shown
  
function myListingRecordHandler($record)
  {
    global 
$dealerID;
    
$fk_advertiser_id $dealerID[$record["ID"]];
  }
  
$url "{link saved}";
  
$xml file_get_contents($url);
  
MagicParser_parse("string://".$xml,"myDealerRecordHandler","xml|IDF/DEALERS/DEALER/");
  
MagicParser_parse("string://".$xml,"myListingRecordHandler","xml|IDF/DEALERS/DEALER/LISTINGS/LISTING/");
?>

> 1. I don't understand what the number in the brackets beside each while()
> statement means... and am not sure if I should be incrementing them.

No need to increment - it just needs to evaluate to true, so while(1) is normally used. This means "loop forever"; but in this case, the loop is ended by "break" statement.

> 2. I am trying to do a while inside another while to show the category name,
> then all the text that belongs in the category... not working for me...
> missing something...

As your objective is to build up a single $description string, I would suggest the same method shown above (using strpos rather than isset) and then add to the $description anything that contains DETAIL and NAME or DESC; for example:

$description = "";
foreach($record as $key => $value)
{
  if (
       (strpos($key,"DETAIL")!==FALSE)
     )
     &&
     (
       (strpos($key,"NAME")!==FALSE)
       ||
       (strpos($key,"DESC")!==FALSE)
     )
   {
     $description .= $value . " "; // append value to description with trailing space
   }
}
trim($description); // remove the unnecessary trailing space

Hope this helps!
Cheers,
David.

Submitted by tekneck on Mon, 2009-08-31 19:03

Thanks, I will try this and see if I can get it to work.

As it appears that I have light at the end of the tunnel now, what would be the best method to actually download the images that are referenced in the feed? I need to collect them from their urls, save them on my server and then I will write the new url location to my db. Any suggestions for the automated downloading of the image? I have a little script that I call to make thumbnails and watermark the images but I need to get them to me first... a little off topic but you might have a jewel of wisdom to make this part of my process.

Submitted by tekneck on Mon, 2009-08-31 19:51

Ok.. applied the example it it does loop through and show all the details I need but i need to group the results by their category name...

So a cat/name would be first, followed by the details until the next cat/name, followed by its details, so on... I am getting it all lumped together now...

Here is what I want to try to get...

DETAIL/CAT
DETAIL/CAT/NAME Cockpit Features
DETAIL/CAT/OPT
DETAIL/CAT/OPT/NAME Cup Holders, Jumbo
DETAIL/CAT/OPT/DESC Stainless Steel
DETAIL/CAT/OPT@1
DETAIL/CAT/OPT/NAME@1 Dash Console, Fiberglass
DETAIL/CAT/OPT/DESC@1 (Custom) - Integrated Port Console

DETAIL/CAT/NAME@1 Control Station
DETAIL/CAT/OPT@16
DETAIL/CAT/OPT/NAME@16 Controls, Side Mount
DETAIL/CAT/OPT/DESC@11 w/ Trim on Handle
DETAIL/CAT/OPT@17
DETAIL/CAT/OPT/NAME@17 Horn
DETAIL/CAT/OPT/DESC@12 Flush-Mounted
DETAIL/CAT/OPT@18
DETAIL/CAT/OPT/NAME@18 Hour Meter
DETAIL/CAT/OPT/DESC@13 Integrated into Tachometer

I know once I can get my head around this it will all make sense... I need to loop while in the loop to show the subs

I am trying to format the description before I put it in the database since I don't map each value to a DB column (way too many to be manageable).

So I am trying to get the specs in as

  • spec
  • then followed by catname, followed by

  • cat option name & desc
  • then repeat for the next catname... This way I can add some css to display it properly on the site... is this a valid way to do this?

    Submitted by support on Tue, 2009-09-01 08:31

    > Any suggestions for the automated downloading of the image?

    Sure - since your server is capable of fopen()'ing a URL, you can simply use the URL as the $source filename in a call to PHP's copy() function. For example; consider the remote image URL in $imageURL, and and listing ID in $listingID, and let's say you have a folder in the current directory called "images" and you want to save the remote image into that folder with the filename [listingID].jpg, something like this would do the trick:

    <?php
      $dest 
    "images/".$listingID.".jpg";
      
    copy($imageURL,$dest);
      
    $newImageURL "http://www.example.com/images/".$listingID.".jpg";
    ?>

    For this to work, images/ must be writable by PHP. The easiest way to do this is normally via your FTP program. In the remote window, right-click on the images folder and then look for "Permissions..." or perhaps "Properties..." and then Permissions. Then give write access to all users - owner / group / world.

    > This way I can add some css to display it properly on the site... is this a valid way to do this?

    Yes - that makes sense - so using nested <ul> with a class name to indicate a spec or an option... then you could style the lists as required using CSS referring to the class names, e.g.

    .spec li {
      // CSS for spec list items
    }
    .opt li {
      // CSS for opt list items
    }

    Based on that structure, have a look at the following extension to the above example;

    $currentCatName = "";
    $currentCatOptName = "";
    $currentCatOptDesc = "";
    $description = "<ul class='spec'>";
    foreach($record as $key => $value)
    {
      if ((strpos($key,"DETAIL")!==FALSE)) && ((strpos($key,"OPT")!==FALSE)) && ((strpos($key,"DESC")!==FALSE))
      {
         $currentCatOptDesc = $value;
      }
      elseif ((strpos($key,"DETAIL")!==FALSE)) && ((strpos($key,"OPT")!==FALSE)) && ((strpos($key,"NAME")!==FALSE))
      {
         $currentCatOptName = $value;
      }
      if ((strpos($key,"DETAIL")!==FALSE)) && ((strpos($key,"NAME")!==FALSE))
      {
         if ($currentCatName) $description .= "</ul>";
         $currentCatName = $value;
         $description .= "<li>".$currentCatName."</li>";
         $description .= "<ul class='opt'>";
      }
      if ($currentCatOptName && $currentCatOptDesc)
      {
        $description .= "<li>".$currentCatOptName." ".$currentCatOptDesc."</li>";
        $currentCatOptName = "";
        $currentCatOptDesc = "";
      }
    }
    $description = "</ul>"; // opt
    $description = "</ul>"; // spec

    Hope this helps!
    Cheers,
    David.

    Submitted by tekneck on Wed, 2009-09-02 12:45

    Thanks for the image advice... so simple.

    Ok.. I can see the light at end of the tunnel. (sorry to be so high maintenance... I just have not got it through my head about what the strpos is checking for and how duplicate string name components affect the loop).

    Here is my code now with your new stuff...

    When it runs, it ALMOST produces the desired CAT NAME and

    • Cat opt Name (and if exists the cat opt desc)
    • but not quite.. can you check and see where I am going wrong?

      The output is pasted below the code... in case you get different results:

      I put the H3 in so it is more evident where the cat name formatting is being called and output...

      Thanks for the all the support, way beyond the software price... as I said, I would be happy to shoot you a few dollars/pounds to simply get this one perfect for me and I can reverse engineer it for the other feeds I am being tasked with...

      require("MagicParser.php");
        function myRecordHandler($record)
        {
          // print_r($record);
          ## Declare a couple variables
          $postfix = '';
      $postfix1 = '';
      $postfix2 = '';
      $postfix3 = '';
      $postfix4 = '';
      $postfix6 = '';
      $description = '';
      ## Test for values on a few variables that are optional
      if (isset($record["HOURS"])) { $engine_hours=mysql_escape_string($record["HOURS"]); } else { $engine_hours = ''; }
      if (isset($record["CITY"])) { $city=mysql_escape_string($record["CITY"]); } else { $city = ''; }
      if (isset($record["STATE"])) { $state=mysql_escape_string($record["STATE"]); } else { $state = ''; }
      if (isset($record["COUNTRY"])) { $country=mysql_escape_string($record["COUNTRY"]); } else { $country = ''; }
      if (isset($record["HULL_MATERIAL"])) { $hull_material=mysql_escape_string($record["HULL_MATERIAL"]); } else { $hull_material = ''; }
      if (isset($record["ENGINE_NUM"]) && $record["ENGINE_NUM"] >0) { $engine_num=mysql_escape_string($record["ENGINE_NUM"]); } else { $engine_num = ''; }
      ## Get the length as feet and inches from inches alone
      if (isset($record["LENGTH"])) {
      $length_ft = (ceil($record["LENGTH"] / 12));
      $length_in = (ceil($record["LENGTH"] % 12));} else
      {$length_ft = '';$length_in = '';}
      ## Starte looping through the array
      $currentCatName = "";
      $currentCatOptName = "";
      $currentCatOptDesc = "";
      $description .= "<ul class='opt'>";
      foreach($record as $key => $value)
      {
      /*
      ## Find all the features and add them to the $description variable
      if (strpos($key,"FEATURES")!==FALSE && strpos($key,"TEXT")!==FALSE)
      {
      $i = 0;
      while(1)
      {
      if ($i) $postfix = "@".$i;
      if (!isset($record["FEATURES/FEATURE".$postfix])) break;
      $description .= "FEATURE: ".mysql_escape_string($record["FEATURES/FEATURE/TEXT".$postfix])."<br/>";
      $i++;
      }
      }
      ## Find all the specs and add them to the $description variable
      if (strpos($key,"SPECS")!==FALSE)
      {
      $j = 0;
      while(1)
      {
      if ($j) $postfix1 = "@".$j;
      if (!isset($record["SPECS/SPEC".$postfix1])) break;
      $description .= "SPECS: ".mysql_escape_string($record["SPECS/SPEC/NAME".$postfix1]).": ".mysql_escape_string($record["SPECS/SPEC/VALUE".$postfix1])."<br/>";
      $j++;
      }
      } */
      ## Find all the detail categories that have values and add them to the $description variable
      /* if ( (strpos($key,"DETAIL")!==FALSE) && (strpos($key,"NAME")!==FALSE) || (strpos($key,"DESC")!==FALSE) )
      {
      $description .= $value . " "; // append value to description with trailing space
      } */
      if ((strpos($key,"DETAIL")!==FALSE) && (strpos($key,"OPT")!==FALSE) && (strpos($key,"DESC")!==FALSE))
      {
      $currentCatOptDesc = $value;
      }
      elseif ((strpos($key,"DETAIL")!==FALSE) && (strpos($key,"OPT")!==FALSE) && (strpos($key,"NAME")!==FALSE))
      {
      $currentCatOptName = $value;
      }
      if ( (strpos($key,"DETAIL")!==FALSE) && (strpos($key,"NAME")!==FALSE))
      {
      if ($currentCatName) $description .= "</ul>";
      $currentCatName = $value;
      $description .= "<h3>".$currentCatName."</h3>";
      $description .= "<ul class='opt'>";
      }
      if ($currentCatOptName && $currentCatOptDesc)
      {
      $description .= "<li>".$currentCatOptName." -> ".$currentCatOptDesc."</li>";
      $currentCatOptName = "";
      $currentCatOptDesc = "";
      }
      $description .= "</ul>"; // opt
      $description .= "</ul>"; // spec
      ## Now add the photos for each item to the photos table
      if (strpos($key,"IMGS")!==FALSE)
      {
      $m = 0;
      while(5)
      {
      if ($m) $postfix4 = "@".$m;
      if (!isset($record["IMGS/IMG/SEQ".$postfix4])) break;
      $sql = "INSERT INTO `photos` SET
      fk_ad_id='".mysql_escape_string($record["ID"])."',
      order='".mysql_escape_string($record["IMGS/IMG/SEQ".$postfix4])."',
      url='".mysql_escape_string($record["IMGS/IMG/FILE"])."'
      ";
      //mysql_query($sql);
      print $sql."<br/>";
      $m++;
      }
      }
      }
      ## Make the insert statement
      $sql = "INSERT INTO `mytable` SET
      fk_advertiser_id='STATIC SINCE I DO NOT NO HOW TO GO UP AND GET THIS',
      feed_item_id='".mysql_escape_string($record["ID"])."',
                  stock_num='".mysql_escape_string($record["STOCK_NUMBER"])."',
                  length_ft='".$length_ft."',
      length_in='".$length_in."',
      ad_type='".mysql_escape_string($record["TYPE"])."',
      status='".mysql_escape_string($record["STATUS"])."',
      year='".mysql_escape_string($record["YEAR"])."',
      model='".mysql_escape_string($record["MODEL"])."',
      engine_hours='".$engine_hours."',
      location_city='".$city."',
      location_prov='".$state."',
      location_country='".$country."',
      hull_material='".$hull_material."',
      engine_num='".$engine_num."',
      desc='".$description."',
      created='".mysql_escape_string($record["DATE_LISTED"])."',
      updated='".mysql_escape_string($record["UPDATED"])."'";
      //mysql_query($sql);
      print "<br/>".$sql."<hr>";
        }
      // MagicParser_parse("{link saved}","myRecordHandler","xml|IDF/DEALERS/DEALER/LISTINGS/LISTING/");
        MagicParser_parse("saved feed url","myRecordHandler","xml|IDF/DEALERS/DEALER/LISTINGS/LISTING/");
        print MagicParser_getErrorMessage();

      ******** THIS IS SOME OUTPUT **********

      INSERT INTO `mytable` SET fk_advertiser_id='STATIC SINCE I DO NOT NO HOW TO GO UP AND GET THIS', feed_item_id='1383845', stock_num='HP12183', length_ft='20', length_in='2', ad_type='N', status='In Stock', year='2007', model='192 Islander', engine_hours='0', location_city='Va Beach', location_prov='VA', location_country='US', hull_material='Fiberglass', engine_num='', desc='

      Cockpit Features <--- THIS IS A CAT NAME
      Cup Holders, Jumbo <--- THIS IS BEING REPEATED IN FORMATTING AS IF IT WERE ALSO A CAT NAME
      Cup Holders, Jumbo -> Stainless Steel <--- HERE IS THE OPT NAME SHOWING UP RIGHT (also the cat name?)
      Dash Console, Fiberglass <--- HERE IS ANOTHER OPT NAME OK...
      Dash Console, Fiberglass -> (Custom) - Integrated Port Console <---- HERE IS THE PREVIOUS OPT NAME
      Drain, Aft
      Floor Liner, Fiberglass
      Floor Liner, Fiberglass -> w/Textured Surface - Integral w/Deck
      Lights, Courtesy
      Receptacle, 12V
      Seating, Bucket
      Seating, Bucket -> w/Slider - Helm
      Seating, Bucket
      Seating, Bucket -> w/Swivel & Slider - Port
      Seating, Rear Bench
      Seating, Rear Bench -> w/Molded Fiberglass Footrests & Storage Below
      Stereo, CD System
      Stereo, CD System -> w/2 Speakers
      Storage
      Storage -> under Bow Seats
      Storage, In-Floor
      Storage, Ski
      Storage, Ski -> under Rear Bench
      Stringer System, Fully Bonded
      Stringer System, Fully Bonded -> Fiberglass Inner Liner Floor
      Sundeck, Padded
      Sundeck, Padded -> w/Manual Lift & Release
      Upholstery, Vinyl (34 oz.)
      Control Station
      Controls, Side Mount
      Controls, Side Mount -> w/ Trim on Handle
      Horn
      Horn -> Flush-Mounted

    Submitted by support on Wed, 2009-09-02 13:44

    Hi,

    As the script is quite long could you perhaps email me with it as an attachment and I'll check it out on my test server...

    Thanks,
    David.

    Submitted by tekneck on Wed, 2009-09-02 15:02

    Done... sent you script (with notes inside) and the feed as an xml file...