You are here:  » Check/test feed before importing into mySQL


Check/test feed before importing into mySQL

Submitted by formmailer on Mon, 2009-03-16 12:30 in

Hi David,

Since I use MagicParser to import feeds automatically into an SQL database, I would like to check the feed before putting it in the database.
The idea is to check the fields first, to see if the feed format hasn't changed and secondly to check for a minimum number of rows/file size.
Do you have a smart solution to do this as quick as possible with the least amount of resources being used.

All suggestions are welcome!

//Jasper

Submitted by support on Mon, 2009-03-16 13:54

Hi Jasper,

Regarding the format having changed, you can generally assume that if your myRecordHandler function has been called, then the format is the same.

However, if i'm reading your requirements correctly; i'm guessing that the first stage of your import process (before the parse) is a DELETE or TRUNCATE etc., so you'd like to know that the latest feed is good before doing that.

In which case, it's probably best to setup a preliminary record handler function, say myTestRecordHandler() which you use to count the records before proceeding; for example:

<?php
  
global $numRecords;
  
$numRecords 0;
  function 
myTestRecordHandler($record)
  {
    global 
$numRecords;
    
$numRecords++;
  }
  
MagicParser_parse("filename.xml","myTestRecordHandler","xml|FORMAT/STRING/");
  if (
$numRecords<100) exit(); // abort if feed less than 100 records
  // **** continue with normal import script here ****
?>

However, I note your comment regarding the least amount of resources. As an alternative to the above, you could perhaps use the filesize() function in conjuction with reading a single record. For example:

<?php
  
global $feedOK;
  
$feedOK FALSE;
  function 
myTestRecordHandler($record)
  {
    global 
$feedOK;
    
$feedOK TRUE;
    return 
TRUE// stop reading any more records
  
}
  
// check filesize
  
if (filesize("filename.xml") < 1024) exit(); // abort if file less than 1K
  // check format
  
MagicParser_parse("filename.xml","myTestRecordHandler","xml|FORMAT/STRING/");
  if (!
$feedOK) exit(); // abort if format changed
  // **** continue with normal import script here ****
?>

Hope this helps!
Cheers,
David.

Submitted by formmailer on Mon, 2009-03-16 14:16

Hi David,

You are right about the DELETE. :-)

Regarding the format changes: a major concern for me is that the feed provider changes fieldnames or removes fields from the feed. This would cause incomplete data in the database, but I think I can check this by simply testing if all required fields have a value.

Regarding the filesize of the feed, does this work for remotely hosted feeds? I haven't downloaded the feeds locally.
(as I am thinking about it: would this improve performance if I downloaded the feeds first? In that case I, maybe I could cache the file first)

//Jasper

Submitted by support on Mon, 2009-03-16 16:54

Hi Jasper,

You could certainly download and cache the files - i've posted my preferred method for doing this in the following thread:

http://www.magicparser.com/node/136

With this in place, and the cacheFetch function included in the code below, you could then do:

<?php
  
global $feedOK;
  
$feedOK FALSE;
  function 
myTestRecordHandler($record)
  {
    global 
$feedOK;
    
$feedOK TRUE;
    return 
TRUE// stop reading any more records
  
}
  
// check filesize
  
$filename cacheFetch("http://www.example.com/feed.xml",86400); // 1 day cache
  
if (filesize($filename) < 1024) exit(); // abort if file less than 1K
  // check format
  
MagicParser_parse($filename,"myTestRecordHandler","xml|FORMAT/STRING/");
  if (!
$feedOK) exit(); // abort if format changed
  // **** continue with normal import script here ****
?>

(PHP tags included for clarity, but of course you would have more code than the above with the cacheFetch function etc.)

And then of course you can check the required fields have a value in myTestRecordHandler exactly as you described.

Cheers,
David.