You are here:  » Extract data from very large xml


Extract data from very large xml

Submitted by loet on Mon, 2007-10-01 09:38 in

Hello David,

I am working now the script and get almost everything done to transform my xml files to a mysql database. But now I am stuck with a very large xml file, size 520mb. I hope you can help me with this.

The problem is that it takes about 6 houres to get all the data from the xml in the database.
So that is no option to do every day.

Is there a way to use the xml for this so I can show the data on a page only for a given HouseCode in a quick way?

I have setup this code to extract the data in the database:

<?php
 set_time_limit
(0);
$file "demoplanning-v2.xml";
$username "root";
$password "";
$db "huisjes";
require(
"MagicParser.php");
$connection mysql_connect("localhost",$username$password) or die(mysql_error());
$db mysql_select_db($db $connection) or die ("Couldn't select database.");
//1- TABLE LEEGMAKEN EN INDEX VERWIJDEREN
$sql1 "TRUNCATE TABLE demoplanning";
mysql_query($sql1);
print 
mysql_error();
//2- INSERT NEW DATA
    
function planning($record)
    {
    
$i 0;
    while(
1) {
      if (
$i$postfix "@".$i;
      if (!isset(
$record["PLANNINGRECORDS/PLANNING".$postfix]))break;
      
$period $record["PLANNINGRECORDS/PLANNING/PERIOD".$postfix];
      
$arrivaldate $record["PLANNINGRECORDS/PLANNING/ARRIVALDATE".$postfix];
      
$arrivaltimefrom $record["PLANNINGRECORDS/PLANNING/ARRIVALTIMEFROM".$postfix];
      
$arrivaltimeuntil $record["PLANNINGRECORDS/PLANNING/ARRIVALTIMEUNTIL".$postfix];
      
$departuredate $record["PLANNINGRECORDS/PLANNING/DEPARTUREDATE".$postfix];
      
$departuretime $record["PLANNINGRECORDS/PLANNING/DEPARTURETIME".$postfix];
      
$bookingonrequest $record["PLANNINGRECORDS/PLANNING/BOOKINGONREQUEST".$postfix];
      
$rentprice $record["PLANNINGRECORDS/PLANNING/RENTPRICE".$postfix];
      
$rentpriceexcldiscount $record["PLANNINGRECORDS/PLANNING/RENTPRICEEXCLDISCOUNT".$postfix];
      
$cs $record["PLANNINGRECORDS/PLANNING/CS".$postfix];
      
$housecode $record["HOUSECODE"];
      print 
$housecode;
      print 
"<br />";
    
$sql "INSERT INTO demoplanning (HouseCode, Period, ArrivalDate, ArrivalTimeFrom, ArrivalTimeUntil, DepartureDate, DepartureTime, BookingOnRequest, RentPrice, RentPriceExclDiscount, CS) VALUES ('".$housecode."', '".$period."', '".$arrivaldate."', '".$arrivaltimefrom."', '".$arrivaltimeuntil."', '".$departuredate."', '".$departuretime."', '".$bookingonrequest."', '".$rentprice."', '".$rentpriceexcldiscount."', '".$cs."')";
    
mysql_query($sql);
    print 
mysql_error();
    
$i++;
            }
    }
MagicParser_parse("demoplanning-v2.xml""planning""xml|HOUSES/HOUSE/");
print 
"<p>FINISHED</p>";
?>

I have a demofile (1,3mb) that I use here: Demofile.

Loet

Submitted by support on Mon, 2007-10-01 09:53

Hello Loet,

That is one big file! It is unlikely to be practical to scan the XML in real time in order to extract a particular record, because XML is a serial storage format.

What it is worth doing first of all is creating a test script without any of the database code to see how long it takes to actually read the entire file without doing anything else. I think you will find however that it is going to of a similar duration, but if it does turn out to be acceptable then it would be possible to create a real-time search script...

Cheers,
David.

Submitted by loet on Thu, 2007-10-04 11:26

Hi David,

I have found a solution for this file.
I can request the data needed realtime with XML-RPC for this.
When I recieve that data from the server I can use Magicparser to do the rest.

Loet