You are here:  » parsing XML with varying numbers of fields with the same name (or picking one)


parsing XML with varying numbers of fields with the same name (or picking one)

Submitted by Lera on Fri, 2012-11-23 03:03 in

Here's the XML from one of the records I need to parse:

<world_list>
   <world charid="433791809063" id="101" timestamp="1166678729"/>
   <world charid="1722282139364" id="401" timestamp="1167974556"/>
   <world charid="515396532057" id="120" timestamp="1166684999"/>
   <world charid="442381981937" id="103" timestamp="1166668648"/>
   <world charid="867583624010" id="202" timestamp="1166664011"/>
   <world charid="450971742731" id="105" timestamp="1166679061"/>
   <world charid="884763891095" id="206" timestamp="1166666676"/>
   <world charid="463856600900" id="108" timestamp="1166682286"/>
   <world charid="446677077069" id="104" timestamp="1166841027"/>
   <world charid="1297080703110" id="302" timestamp="1167029363"/>
   <world charid="871878783455" id="203" timestamp="1166660895"/>
</world_list>

The problem is that each record might have a different number of elements, and they all have the same name. What I really need to do is pick the one with the earliest timestamp that isn't a particular id (not entirely sure which one yet, but, for example, 206). Any way to do that?

I have the import files I was sent from this site. Thanks for those - they work great!

Submitted by support on Fri, 2012-11-23 10:21

Hi Lera,

Here's an example using the standard pattern I suggest for resolving the @n postfix codes that Magic Parser uses to resolve duplicate keys; and for each one an array is loaded if the id is not the one you wish to ignore; the array can then be sorted by key (timestamp) so that the earliest one can be picked.. Have a go with something like this - within your myRecordHandler function:

  $wordlist = array();
  $i = 0;
  $p = "";
  while(1)
  {
    if ($i) $p = "@".$i;
    if (!isset($record["WORLD_LIST/WORLD".$p])) break;
    $id = $record["WORLD_LIST/WORLD".$p."-ID"];
    $charid = $record["WORLD_LIST/WORLD".$p."-CHARID"];
    $timestamp = $record["WORLD_LIST/WORLD".$p."-TIMESTAMP"];
    if ($id == "206") continue;
    $wordlist[$timestamp] = $charid;
    $i++;
  }
  ksort($wordlist);
  $charid = array_shift($wordlist);
  // $charid is now the one with the earliest timestamp, excluding the id to ignore...

Hope this helps!
Cheers,
David.
--
MagicParser.com

Submitted by Lera on Sun, 2012-11-25 02:59

Would I then use

field = '".database_safe($charid[timestamp])."',

to add the value to my database? The timestamp's the value I need to keep.

Submitted by support on Sun, 2012-11-25 11:02

Hi Lera,

Nearly - just use $charid on its own, e.g.

field = '".database_safe($charid)."',

Cheers,
David
--
MagicParser.com

Submitted by Lera on Sun, 2012-11-25 16:04

I'm getting an error where the script runs for quite a while and doesn't complete. Could the while(1) and the continue command have anything to do with that? Not all records will have the id I need to skip, and my XML file has numerous records - I added echo $charid; and it's printing the first record, but nothing after that.

Here's the output:

Content-type: text/html
Done. Table created.Content-type: text/html
871878783455
/bin/sh: line 1: 24004 Killed php /path/script.php

Submitted by support on Sun, 2012-11-25 18:09

Hello Lera,

This line should be bailing from the while(1) at the end (or if there are no) WORD_LIST/WORLD elements;

if (!isset($record["WORLD_LIST/WORLD".$p])) break;

To double check it might be worth commenting out that section and running the script again with an echo of one of the fields that does exist in every record; however if you could post the full script you are currently remove any sensitive information of course) I'll check it out in full...

Cheers,
David.

Submitted by Lera on Sun, 2012-11-25 21:51

<?php
  set_time_limit(0);
  require("database.inc.php");
  require("MagicParser.php");
  $count = 0;
  function myRecordHandler($record)
  {
    global $count;
    $wordlist = array();
    $i = 0;
    $p = "";
    while(1)
    {
       if ($i) $p = "@".$i;
       if (!isset($record["_EXTENDED/DISCOVERED/WORLD_LIST/WORLD".$p])) break;
       $id = $record["_EXTENDED/DISCOVERED/WORLD_LIST/WORLD".$p."-ID"];
       $charid = $record["_EXTENDED/DISCOVERED/WORLD_LIST/WORLD".$p."-CHARID"];
       $timestamp = $record["_EXTENDED/DISCOVERED/WORLD_LIST/WORLD".$p."-TIMESTAMP"];
       if ($id == "100") continue;
       $wordlist[$timestamp] = $charid;
       $i++;
    }
    ksort($wordlist);
    $charid = array_shift($wordlist);
    // $charid is now the one with the earliest timestamp, excluding the id to ignore...
    echo $charid."\n";
    $sql = "
      INSERT INTO `test` SET
        field1 = '".database_safe($record["_EXTENDED/DISCOVERED-TIMESTAMP"])."',
        field2 = '".database_safe($charid)."'
        ";
    database_queryModify($sql,$result);
    $count++;
  }
  $sql = "TRUNCATE `test`";
  database_queryModify($sql,$result);
  $filename = "{link saved}";
  if (!MagicParser_parse($filename,"myRecordHandler","xml|ITEM_LIST/ITEM/"))
  {
    die("Could not open ".$filename);
  }
  print "Done. ".$count." records imported.";
?>

Submitted by support on Mon, 2012-11-26 11:29

Thanks Lera,

I spotted the problem - it is to do with the order of $i++ and the continue; statement if the ID is the one you wish to ignore - they need to be reversed. Have a go with:

<?php
  set_time_limit(0);
  require("database.inc.php");
  require("MagicParser.php");
  $count = 0;
  function myRecordHandler($record)
  {
    global $count;
    $wordlist = array();
    $i = 0;
    $p = "";
    while(1)
    {
       if ($i) $p = "@".$i;
       if (!isset($record["_EXTENDED/DISCOVERED/WORLD_LIST/WORLD".$p])) break;
       $id = $record["_EXTENDED/DISCOVERED/WORLD_LIST/WORLD".$p."-ID"];
       $charid = $record["_EXTENDED/DISCOVERED/WORLD_LIST/WORLD".$p."-CHARID"];
       $timestamp = $record["_EXTENDED/DISCOVERED/WORLD_LIST/WORLD".$p."-TIMESTAMP"];
       $i++;
       if ($id == "100") continue;
       $wordlist[$timestamp] = $charid;
    }
    ksort($wordlist);
    $charid = array_shift($wordlist);
    // $charid is now the one with the earliest timestamp, excluding the id to ignore...
    echo $charid."\n";
    $sql = "
      INSERT INTO `test` SET
        field1 = '".database_safe($record["_EXTENDED/DISCOVERED-TIMESTAMP"])."',
        field2 = '".database_safe($charid)."'
        ";
    database_queryModify($sql,$result);
    $count++;
  }
  $sql = "TRUNCATE `test`";
  database_queryModify($sql,$result);
  $filename = "{link saved}";
  if (!MagicParser_parse($filename,"myRecordHandler","xml|ITEM_LIST/ITEM/"))
  {
    die("Could not open ".$filename);
  }
  print "Done. ".$count." records imported.";
?>

(replace {link saved} with your feed URL)

All the best,
David
--
MagicParser.php

Submitted by Lera on Mon, 2012-11-26 19:09

Got it working! Thanks!