I am trying to extract the 100 terms supplied by the Google trends RSS feed:
http://www.google.com/trends/hottrends/atom/hourly
However I am having trouble getting down to the actual terms, it seems that the path starts:
xml|FEED/ENTRY/CONTENT/
And then the terms are wrapped in <ol><li><span><a>
So there are then 100 terms wrapped in tags. I'd be very grateful if you could help with the correct format file so that my record handler will iterate over each term. I have tried various combinations such as
xml|FEED/ENTRY/CONTENT/OL/LI/SPAN/A
but I suspect I am missing a point here! It seems there is some html wrapped in an XML block but I would have thought I could still dig down into it?
I was wondering is an alternative would be to extract the HTML part first, then parse it again? Seems a bit clumsy though!
My ultimate goal is to write each of the 100 terms to a database table.
Many thanks,
Jez.
Thank for the reply David, that looks very helpful and should get me going again :)
Hello Jez,
You're quite correct, it is HTML embedded within an XML field, and since it is CDATA delimited it is not possible to parse down into it as such. However, since the format is uniform, it is straight forward to extract each query with a bit of PHP trickery.
I've just tested it out using explode() on "<li>" to break the CONTENT value down into individual lines, and then using strpos() to find the constant strings that surround the query on each line. Here's the output running on this server:
http://www.magicparser.com/examples/trends.php
Here's the source:
<?php
header("Content-Type: text/plain");
require("MagicParser.php");
function myRecordHandler($record)
{
$trends = explode("<li>",$record["CONTENT"]);
foreach($trends as $trend)
{
$posA = strpos($trend,"sa=X\">");
if ($posA)
{
$posA = $posA + 6;
$posB = strpos($trend,"</a>",$posA);
$query = substr($trend,$posA,($posB-$posA));
print $query;
print "\n";
}
}
}
$url = "http://www.google.com/trends/hottrends/atom/hourly";
MagicParser_parse($url,"myRecordHandler","xml|FEED/ENTRY/CONTENT/");
?>
Of course you can then add to the code above to add each $query value to the database.
Hope this helps!
Cheers,
David.