You are here:  » multiple level xml want to export to csv


multiple level xml want to export to csv

Submitted by Johnny N on Tue, 2009-04-07 13:22 in

just got this product, I think it is a great product.

however, I cannot seem to find a built in way to dump a multi level xml file to csv.

for example a snippet of my xml is

<HtSearchRq>
  <Success>True</Success>
  <LangID>EN</LangID>
  <Country Code="BG" Name="Bulgaria" ID="16">
    <Destination name="Golden Sands" ID="132">
    <ISO_Codes Code_1="VAR" Code_2="" Code_3=""/>
    <Resort ID="533">
      <Resort_Name>Golden Sands</Resort_Name>
        <Hotel>
          <Hotel_ID>4274</Hotel_ID>
          <Hotel_Name>.Com Albena</Hotel_Name>
          <Mapping>
            <Latitude>43.37067</Latitude>
            <Longitude>28.07669</Longitude>
          </Mapping>
       </Hotel>
       <Hotel>
         <Hotel_ID>3247</Hotel_ID>
         <Hotel_Name>Ambassador Hotel</Hotel_Name>
         <Mapping>
           <Latitude>43.27864</Latitude>
           <Longitude>28.04192</Longitude>
         </Mapping>
       </Hotel>
     </Resort>
   </Destination>
   <Destination name="Sunny Beach" ID="131">
     <ISO_Codes Code_1="BOJ" Code_2="" Code_3=""/>
     <Resort ID="535">
       <Resort_Name>Nessebar</Resort_Name>
       <Hotel>
         <Hotel_ID>4219</Hotel_ID>
         <Hotel_Name>Iberostar Festa Panorama and Spa</Hotel_Name>
         <Mapping>
           <Latitude>42.6645</Latitude>
           <Longitude>27.7171</Longitude>
         </Mapping>
       </Hotel>
     </Resort>
   </Destination>
 </Country>
 <Country Code="ES" Name="Canaries" ID="5">
   <Destination name="Fuerteventura" ID="37">
     <ISO_Codes Code_1="FUE" Code_2="" Code_3=""/>
     <Resort ID="138">
       <Resort_Name>Antigua</Resort_Name>
         <Hotel>
           <Hotel_ID>2234</Hotel_ID>
           <Hotel_Name>La Piramide Apartments</Hotel_Name>
           <Mapping>
             <Latitude>28.41465</Latitude>
             <Longitude>-13.85263</Longitude>
           </Mapping>
         </Hotel>
       </Resort>
   </Destination>
 </Country>
</HtSearchRq>

I want to be able to dump it into csv kinda like the following.

countryname, countryId, countryCode, destinationName, destinationId, resortId, resortName, hotelId, hotelName, hotelLat, hotelLong

as you can see I have left out some items like the whole iso tag.
also I may want to omit some items, so I need to customize the output to make a smaller csv like the following. (on a different script)

countryName, destinationName, resortName, hotelName

Is there a built in way to do this, can I nest the magic_parser call to do something like this?
or is there some other manual way to do it with arrays?

Submitted by support on Tue, 2009-04-07 13:51

Hi Johnny,

There's a convenient method to achieve this, based on parsing
your example XML at the COUNTRY level, using the format string
xml|HTSEARCHRQ/COUNTRY/. The trick is to loop
through the values in $record using PHP's foreach() construct,
and then study the value of the key to see if it matches one of
the fields you're interested in. If it is, set a variable with
the value of that field. This needs to be done in reverse
hierarchical order so that, for example, the hotel name doesn't
get overwritten by a resort or destination name.

Then, once all values have been found; output the line of CSV
and look for the next set of complete values. Here's an example
based on your example XML (saved on my server in a file called
mlexport.xml, but you can of course change that to whatever
you're using):

<?php
  header
("Content-Type: text/plain;");
  
// uncomment next 2 lines to force download rather than display within browser
  // header("Content-Type: application/octet-stream");
  // header("Content-Disposition: attachment; filename=hotels.csv");
  
require("MagicParser.php");
  function 
myRecordHandler($record)
  {
    
// countryname, countryId, countryCode, destinationName, destinationId, resortId, resortName, hotelId, hotelName, hotelLat, hotelLong
    
$countryName $record["COUNTRY-NAME"];
    
$countryId $record["COUNTRY-ID"];
    
$countryCode $record["COUNTRY-CODE"];
    foreach(
$record as $k => $v)
    {
      
$v str_replace(",","",$v);
      if (
strpos($k,"HOTEL")!==FALSE && strpos($k,"NAME")!==FALSE$hotelName $v;
      elseif (
strpos($k,"HOTEL")!==FALSE && strpos($k,"ID")!==FALSE$hotelId $v;
      elseif (
strpos($k,"HOTEL")!==FALSE && strpos($k,"LATITUDE")!==FALSE$hotelLat $v;
      elseif (
strpos($k,"HOTEL")!==FALSE && strpos($k,"LONGITUDE")!==FALSE$hotelLong $v;
      elseif (
strpos($k,"RESORT")!==FALSE && strpos($k,"NAME")!==FALSE$resortName $v;
      elseif (
strpos($k,"RESORT")!==FALSE && strpos($k,"ID")!==FALSE$resortId $v;
      elseif (
strpos($k,"DESTINATION")!==FALSE && strpos($k,"NAME")!==FALSE$destinationName $v;
      elseif (
strpos($k,"DESTINATION")!==FALSE && strpos($k,"ID")!==FALSE$destinationId $v;
      if (
$hotelId && $hotelName && $hotelLat && $hotelLong)
      {
        print 
$countryName.","
             
.$countryId.","
             
.$countryCode.","
             
.$destinationName.","
             
.$destinationId.","
             
.$resortId.","
             
.$resortName.","
             
.$hotelId.","
             
.$hotelName.","
             
.$hotelLat.","
             
.$hotelLong."\n";
        
$hotelId "";$hotelName "";$hotelLat "";$hotelLong "";
      }
    }
  }
  
MagicParser_parse("mlexport.xml","myRecordHandler","xml|HTSEARCHRQ/COUNTRY/");
?>

Here's the output running on this server:

http://www.magicparser.com/examples/mlexport.php

You'll see a couple of lines commented out - you can uncomment these if
you want to force a download rather than view the CSV in the browser...

Hope this helps!
Cheers,
David.

Submitted by Johnny N on Tue, 2009-04-07 14:52

thanks David,
that worked (had to turn off error notices tho)

I would like to understand what magic_parser returns a bit better,
if I paste the xml snippet in this example into the demo page of this site, I get 2 records returned.
I guess it is returning 2 country records.

the confusion comes in this solution where we have told MagicParser_parse to look at HTSEARCHRQ/COUNTRY level, which in my mind would logically return 3 RESORT records,
but in fact we get 4 hotel records out of this.
Is this because we are also parsing thru the hotel children of RESORT as well?

Submitted by Johnny N on Tue, 2009-04-07 15:02

another question,
I added a print line inside the callback function (1st line of function) and it printed a line before every country's records.

am i correct to think that the callback function is called every time the parser encounters a new top level element in the format string?

Submitted by support on Tue, 2009-04-07 15:04

Hi Johnny,

Ordinarily, Magic Parser is designed to parse lots of repeating elements (that themselves contain child elements) at the same level of the hierarchy. I refer to them as a "record", and for every complete record that is parsed out it is handed to your myRecordHandler function for processing.

When you upload or paste content into the demo tool, the parser uses it's auto-detection mechanism to work out which part of your XML constitutes a record. However, when writing code against a known format, you can specify the XPATH to the level you're interested in as part of the "Format String" - the 3rd (optional) parameter to MagicParser_parse(). You should always use a format string where known otherwise the parser has to read the entire file twice - once to work out the format string, and secondly to extract and return the records.

In the case of your XML, there are multiple repeating elements - multiple countries, each with multiple destinations, each with multiple hotels.

So, by parsing at the "COUNTRY" level, the myRecordHandler function is going to receive an array of all hotels in that country, and the sample code above shows how to extract each hotel within that country record.

In the case of the sample posted above, the first call contains all the Bulgaria hotels, and the second call contains all (one) of the Spanish (ES), resulting in a total of 4 hotels being output in the CSV.

Hope this helps!

Cheers,
David.

Edit - just saw your second question - I think the above covers that, but basically yes - the record handler function is called for each "record" found in the feed, so in the case of your sample XML, and we parsing at the COUNTRY level it is called twice - once for each country in the feed...

Submitted by Johnny N on Tue, 2009-04-07 15:12

okey dokey, sounds good thanks for the explanation David.