You are here:  » Parsing Multiple RSS with Different Schemas


Parsing Multiple RSS with Different Schemas

Submitted by Mokono on Sun, 2010-02-21 01:54 in

I have searched as best that I could for an answer on these forums and could not find this question answered.

I am trying to parse multiple rss feeds and insert them into a MySQL database, I can do this with all of the feeds individually, but i would like to create a page that will insert everything at once instead of multiple pages for the different feeds. You mention there is a way to do this but can you give an example of how?

<?php
  
require("MagicParser.php");
  function 
myRecordHandler($record)
  {
$query "INSERT INTO deals_raw set
            deal_title = '"
.$record["TITLE"]."',
            deal_desc = '"
.$record["DESCRIPTION"]."',
            deal_link = '"
.$record["LINK"]."'
            "
;
          
$result mysql_query($query) or die('This deal did not post... ');
}
  
MagicParser_parse("<<my link>>","myRecordHandler","xml|RSS/CHANNEL/ITEM/");
?>

One other unrelated question:
Is there a way to take the following code and split the img from the text?

<description>
 <![CDATA[
   <img src="http://images.tigerdirect.com/SKUimages/small/S222-5518-main02-am.jpg"/><br />
  ]]>
SAVE $1100 - FREE Shipping - Samsung 55" LED HDTV UN55B6000 w/ Samsung BD-P1600 Blu-Ray Player Bundle $1899.99 - after COUPON CODE: VCN853
</description>

Submitted by support on Sun, 2010-02-21 05:32

Hi Mokono,

The basic approach would simply be to have one record handler for each different type of feed, and then multiple calls to MagicParser_parse() for each one (feeds with the same format can share the same record handler function). For example:

<?php
  
require("MagicParser.php");
  function 
myRSSRecordHandler($record)
  {
    
$query "INSERT INTO deals_raw set
            deal_title = '"
.$record["TITLE"]."',
            deal_desc = '"
.$record["DESCRIPTION"]."',
            deal_link = '"
.$record["LINK"]."'
            "
;
    
$result mysql_query($query) or die('This deal did not post... ');
  }
  function 
myAtomRecordHandler($record)
  {
    
$query "INSERT INTO deals_raw set
            deal_title = '"
.$record["TITLE"]."',
            deal_desc = '"
.$record["CONTENT"]."',
            deal_link = '"
.$record["LINK-HREF"]."'
            "
;
    
$result mysql_query($query) or die('This deal did not post... ');
  }
  
MagicParser_parse("<<my link1>>","myRSSRecordHandler","xml|RSS/CHANNEL/ITEM/");
  
MagicParser_parse("<<my link2>>","myRSSRecordHandler","xml|RSS/CHANNEL/ITEM/");
  
MagicParser_parse("<<my link3>>","myAtomRecordHandler","xml|FEED/ENTRY/");
?>

However, it might be better in practice to only have one section of code where you actually perform the database work, limiting the amount of repeated code. This way; you would load records from the different formats into your own global array with a fixed format (representing your database fields perhaps) and then load the database from the global array after parsing all feeds. For example:

<?php
  
require("MagicParser.php");
  
$allRecords = array();
  function 
myRSSRecordHandler($record)
  {
    global 
$allRecords;
    
$temp = array();
    
$temp["deal_title"] = $record["TITLE"];
    
$temp["deal_desc"] = $record["DESCRIPTION"];
    
$temp["deal_link"] = $record["LINK"];
    
$allRecords[] = $temp;
  }
  function 
myAtomRecordHandler($record)
  {
    global 
$allRecords;
    
$temp = array();
    
$temp["deal_title"] = $record["TITLE"];
    
$temp["deal_desc"] = $record["CONTENT"];
    
$temp["deal_link"] = $record["LINK-HREF"];
    
$allRecords[] = $temp;
  }
  
MagicParser_parse("<<my link1>>","myRSSRecordHandler","xml|RSS/CHANNEL/ITEM/");
  
MagicParser_parse("<<my link2>>","myRSSRecordHandler","xml|RSS/CHANNEL/ITEM/");
  
MagicParser_parse("<<my link3>>","myAtomRecordHandler","xml|FEED/ENTRY/");
  foreach(
$allRecords as $record)
  {
    
$query "INSERT INTO deals_raw set
            deal_title = '"
.mysql_escape_string($record["deal_title"])."',
            deal_desc = '"
.mysql_escape_string($record["deal_desc"])."',
            deal_link = '"
.mysql_escape_string($record["deal_link"])."'
            "
;
    
$result mysql_query($query) or die('This deal did not post... ');
  }
?>

In the second example, I have added mysql_escape_string to the construction of your $query which is important otherwise content in the feeds could break the structure of the SQL.

Regarding extracting the image src URL from your example data; which when using Magic Parser would be within the $record["DESCRIPTION"] field; the best approach is to use regular expressions. Regular expressions themselves are a huge subject; but for example, to extract the URL between quotes from $record["DESCRIPTION"] you could use this;

preg_match('/(?<=")(.*?)(?=")/',$record["DESCRIPTION"],$matches);
$src = $matches[0];

The image URL will be in $src after the above code.

Hope this helps!
Cheers,
David.