You are here:  » Getting data from the xmlns


Getting data from the xmlns

Submitted by sidabm on Wed, 2013-01-16 21:25 in

I have a feed i want to parse, but the feed is provided one page at a time, therefore i need to call each page in turn. My issue is that in the first page returned in the xmlns part of the feed it tell you how many pages are available. 2 questions

1. how do i extract the data in the xmlns elements
2. how do i loop through each rss page

Thanks in advance

Sidabm

Submitted by support on Thu, 2013-01-17 12:27

Hi Sidabm,

Please could copy an example of the header of the first page of your XML source into a post and I'll check it out for you!

Cheers,
David.

Submitted by sidabm on Fri, 2013-01-18 17:05

David,

Below is the header and first record

{code saved}

Submitted by support on Sat, 2013-01-19 11:07

Hi Sidabm,

Thanks! The Next / Last links are in the header as you say, but not explicitly indicated by element name, so it's necessary to inspect each LINK/REL item, look for "next" and then extract the associated URL. If a next link exists, the script can then refresh and use that URL.

Note that there appear to be several thousand so I'll demonstrate how to scroll through each one, but with a 2 second delay (see $sleep variable) between each refresh so you can test / stop the process!!

Consider the following example;

<?php
  
require("MagicParser.php");
  
$url = (isset($_GET["url"])?base64_decode($_GET["url"]):"");
  
$sleep 2;
  
$page_1_url "http://www.example.com/feed.asp?page=1";
  function 
myRecordHandler($record)
  {
    
// process $record as required
  
}
  function 
myHeaderRecordHandler($record)
  {
    global 
$next_url;
    
$next_url "";
    foreach(
$record as $k => $v)
    {
      if (
$v=="next")
      {
        
$k str_replace("REL","HREF",$k);
        
$next_url $record[$k];
        break;
      }
    }
  }
  if (!
$url$url $page_1_url;
  
$xml file_get_contents($url);
  
// parse to process data
  
MagicParser_parse("string://".$xml,"myRecordHandler","xml|FEED/ENTRY/");
  
// parse to get next URL
  
MagicParser_parse("string://".$xml,"myHeaderRecordHandler","xml|FEED/");
  if (
$next_url)
  {
    print 
"<p>Continuing at ".$next_url." in ".$sleep." seconds...</p>";
    print 
"<meta
      http-equiv='refresh'
      content='"
.$sleep.";url=?url=".base64_encode($next_url)."' />
      "
;
  }
  else
  {
    print 
"<p>Done.</>";
  }
?>

Don't forget to replace the value of $page_1_url with your actual page 1 URL, and that should be close!

Cheers,
David
--
MagicParser.com

Submitted by sidabm on Mon, 2013-01-21 21:03

David,

Brilliant you are a star, works a dream

Sidabm