Support Forum

Request new password

Active Forum Topics

Cache Integration

Submitted by ukdave on Sat, 2006-07-22 20:31 in Magic Parser

Hi Dave, I assume that your "WGET" caching method will only work on a Linux server is that right? I usually test on Windows but do upload to a Linux based host.

Everything is working great but I don't know how to integrate this (your cache script):

<?php
  function cacheFetch($url,$age)
  {
    // directory in which to store cached files
    $cacheDir = "cache/";
    // cache filename constructed from MD5 hash of URL
    $filename = $cacheDir.md5($url);
    // default to fetch the file
    $fetch = true;
    // but if the file exists, don't fetch if it is recent enough
    if (file_exists($filename))
    {
      $fetch = (filemtime($filename) < (time()-$age));
    }
    // fetch the file if required
    if ($fetch)
    {
      // shell to wget to fetch the file
      exec("wget -N -O ".$filename." \"".$url."\"");
      // update timestamp to now
      exec("touch ".$filename);
    }
    // return the cache filename
    return $filename;
  }
?>

<?php
  // fetch (if required)
  $filename = cacheFetch("http://www.example.com/feed.xml",86400);
  // parse
  MagicParser_parse($filename,"myRecordHandler");
?>

Into this Kelkoo live data feed script (most of which is your doing!):

<?php
  header("Content-Type: text/html;charset=utf-8");
  require("MagicParser.php");
  // get $page from the URL
  $page = $_GET["page"];
  if (!$page) $page = 1;
  $nbresult = 5;
  $offset = ((($page-1) * $nbresult)+1);
  $url = "http://export.kelkoo.co.uk/ctl/exportSearch?partner=tradedoubler&partnerId=96906467&nbresult=".$nbresult."&offset=".$offset."&siteSearchQuery=shoes&catId=100164013";
  $numshops = 0;
  $numtotalresults = 0;
  function myHeaderRecordHandler($record)
  {
    global $numshops;
    global $numtotalresults;
    $numshops = $record["HEADER/NUMSHOPS"];
    $numtotalresults = $record["HEADER/NUMTOTALRESULTS"];
  }
  MagicParser_parse($url,"myHeaderRecordHandler","xml|PRODUCTSEARCH/");
  // print "<p>Num Shops: ".$numshops."</p>";
  // print "<p>Num Total Records: ".$numtotalresults."</p>";
  function myRecordHandler($item)
  {
    print "<p>".$item["OFFERTITLE"]."</p>";
  }
  // View the URL for testing
  // echo "$url";
  // fetch the response and parse the results
  MagicParser_parse($url,"myRecordHandler","xml|PRODUCTSEARCH/RESULTLIST/RESULT/");
  print "<p>";
  if ($page > 1) print "<a href='?page=".($page-1)."'>Prev</a>&nbsp;&nbsp;";
  if (($page * $nbresult) < $numtotalresults) print "<a href='?page=".($page+1)."'>Next</a>";
  print "</p>";
  echo "$offset";
?>

Can you help please?

Hi Dave, The cache script

Submitted by support on Sun, 2006-07-23 07:46

Hi Dave,

The cache script should work on Windows if you install wget and make sure that it is in your "path". You can get wget for windows here:

http://gnuwin32.sourceforge.net/packages/wget.htm

The cache script requires a sub-directory called "cache" within the directory in which the script is running in that is writable by the user that PHP is running as. On your remote Linux server, you should be able to create the directory and make it writable using your FTP program. Find the option to create a remote directory (try right-clicking in the remote tree view), and then set the permissions on the new directory to make it "World Writable". Again, try right clicking on the new directory to see if a permissions menu is displayed.

Then, to integrate the script, simply copy the cacheFetch function into the top of your script, fetch the URL as required and use the filename returned by the cache function in your calls to MagicParser_parse.

Important: To avoid filling your disk space, you should setup a process to automatically empty the cache directory every so often, perhaps daily.

<?php
  header("Content-Type: text/html;charset=utf-8");
  require("MagicParser.php");
  function cacheFetch($url,$age)
  {
    // directory in which to store cached files
    $cacheDir = "cache/";
    // cache filename constructed from MD5 hash of URL
    $filename = $cacheDir.md5($url);
    // default to fetch the file
    $fetch = true;
    // but if the file exists, don't fetch if it is recent enough
    if (file_exists($filename))
    {
      $fetch = (filemtime($filename) < (time()-$age));
    }
    // fetch the file if required
    if ($fetch)
    {
      // shell to wget to fetch the file
      exec("wget -N -O ".$filename." \"".$url."\"");
      // update timestamp to now
      exec("touch ".$filename);
    }
    // return the cache filename
    return $filename;
  }
  // get $page from the URL
  $page = $_GET["page"];
  if (!$page) $page = 1;
  $nbresult = 5;
  $offset = ((($page-1) * $nbresult)+1);
  $url = "http://export.kelkoo.co.uk/ctl/exportSearch?partner=tradedoubler&partnerId=96906467&nbresult=".$nbresult."&offset=".$offset."&siteSearchQuery=shoes&catId=100164013";
  // fetch $url using cache with 1 hour age limit
  $filename = cacheFetch($url,86400);
  $numshops = 0;
  $numtotalresults = 0;
  function myHeaderRecordHandler($record)
  {
    global $numshops;
    global $numtotalresults;
    $numshops = $record["HEADER/NUMSHOPS"];
    $numtotalresults = $record["HEADER/NUMTOTALRESULTS"];
  }
  MagicParser_parse($filename,"myHeaderRecordHandler","xml|PRODUCTSEARCH/");
  // print "<p>Num Shops: ".$numshops."</p>";
  // print "<p>Num Total Records: ".$numtotalresults."</p>";
  function myRecordHandler($item)
  {
    print "<p>".$item["OFFERTITLE"]."</p>";
  }
  // View the URL for testing
  // echo "$url";
  // fetch the response and parse the results
  MagicParser_parse($filename,"myRecordHandler","xml|PRODUCTSEARCH/RESULTLIST/RESULT/");
  print "<p>";
  if ($page > 1) print "<a href='?page=".($page-1)."'>Prev</a>&nbsp;&nbsp;";
  if (($page * $nbresult) < $numtotalresults) print "<a href='?page=".($page+1)."'>Next</a>";
  print "</p>";
  echo "$offset";
?>

Now, if you experience problems, the first thing to do is debug the return value from cacheFetch, for example:

<?php
  // fetch $url using cache with 1 hour age limit
  $filename = cacheFetch($url,86400);
  print $filename;
  exit();
?>

That will display the local file generated by the cache function, which you can then look at in the cache directory. If the file does not exist, that would indicate that PHP does not have WRITE access to the cache directory. If the file exists but it is empty, that indicates a wget problem.

Another modification that may be required on Windows is to remove the touch commmand and prevent wget from updating the timestamp to correspond to the remove filename. The changes within the cacheFetch function are as follows:

    if ($fetch)

    {

      // shell to wget to fetch the file, -N removed for Windows

      exec("wget -O ".$filename." \"".$url."\"");

      // update timestamp to now - commented out for Windows

      // exec("touch ".$filename);

    }

Hope this helps;

Cheers,
David.

Excellent, I'll see if I can

Submitted by ukdave on Sun, 2006-07-23 10:33

Excellent, I'll see if I can get it going and figure out some sort of cron job to empty the cache on a schedule. Thanks, Dave.

You wrote: <?php //

Submitted by ukdave on Thu, 2006-08-10 23:44

You wrote:

<?php
  // fetch $url using cache with 1 hour age limit
  $filename = cacheFetch($url,86400);
  print $filename;
  exit();
?>

Are you sure you meant "1 hr age limit"? I have asked because there are 86400 seconds in 24 hrs so am thinking you probably meant "1 day age limit". Is that right?

You're correct - 86400 is a

Submitted by support on Fri, 2006-08-11 07:31

You're correct - 86400 is a 1 day age limit, 3600 would give 1 hour. My mistake...!

Cheers,
David.

I've also added a Mirago XML

Submitted by ukdave on Mon, 2006-08-14 16:49

I've also added a Mirago XML search feed to the bottom of the above script and wanted to use a cache here as well so I copied the cache function and renamed some of the variables to get it working. These are the changes:

<?php
function cacheFetch2($url2,$age)
  {
    // directory in which to store cached files
    $cacheDir = "cache/";
    // cache filename constructed from MD5 hash of URL
    $filename = $cacheDir.md5($url2);
    // default to fetch the file
    $fetch = true;
    // but if the file exists, don't fetch if it is recent enough
    if (file_exists($filename))
    {
      $fetch = (filemtime($filename) < (time()-$age));
    }
    // fetch the file if required
    if ($fetch)
    {
      // shell to wget to fetch the file
      exec("wget -N -O ".$filename." \"".$url2."\"");
      // update timestamp to now
      exec("touch ".$filename);
    }
    // return the cache filename
    return $filename;
  }
// fetch $url using cache with 1 day age limit
  $filename = cacheFetch2($url2,86400);
?>

Mirago cache files are being created but my statistics are still showing a fresh call to the XML server for duplicate searches. It is as if the script is ignoring the cache and instead makes a fresh call to the XML server. I expected the number of searches in my stats to drop dramatically with a cache so am wondering if I have perhaps done something wrong. Do I need to rename some of the other variables so they don't appear twice in the same script?

Thanks, Dave.

You shouldn't need to make

Submitted by support on Mon, 2006-08-14 19:10

You shouldn't need to make any other changes; but I would add some debug code to make it print something out when the real file is requested, for example:

<?php
    // fetch the file if required
    if ($fetch)
    {
      print "<p>CACHE MISS! Fetching File</p>";
      // shell to wget to fetch the file
      exec("wget -N -O ".$filename." \"".$url2."\"");
      // update timestamp to now
      exec("touch ".$filename);
    }
?>

Now, if you then see CACHE MISS! on every page view for the same keywords then something else is wrong. The first place to look is whether the file timestamping is working properly on your server, as it implies that this test is not working:

$fetch = (filemtime($filename) < (time()-$age));

That would be more complicated to debug; but perhaps another check would be to debug the test for the existing file; so also make the following change:

<?php
    // but if the file exists, don't fetch if it is recent enough
    if (file_exists($filename))
    {
      print "<p>FILE EXISTS! Checking Timestamp</p>";
      $fetch = (filemtime($filename) < (time()-$age));
    }
?>

Cheers,
David.

A very helpful mod David

Submitted by ukdave on Mon, 2006-08-14 22:04

A very helpful mod David thanks.

I made the changes and everything seemed to be working properly so I took another look at the online stats and noticed that today is showing as Monday 13th when in reality it is the 14th. Looks like stats may be running 24hrs behind but I won't know for sure until tommorrow.

I feel a lot better knowing for certain that the cache is functioning correctly.

Hi David, I have just

Submitted by crounauer on Thu, 2006-08-17 11:10

Hi David,

I have just finished (but still in development) a site utelising Laterooms XML feed Virtual Hotels. I still need to implement URL re-writes in a .htaccess file and was wondering if this would make any difference when implementing this cache script or is there antything I should be aware of?

Thanks,
Simon.

Hi Simon, Re-writes

Submitted by support on Thu, 2006-08-17 11:58

Hi Simon,

Re-writes shouldn't have any effect on the way this cache script works - PHP has no knowledge of the re-write in effect; as far as the script is concerned it is still running in the "real" directory so it will be able to find the cache directory.

Cheers,
David.

Just to open an old thread -

Submitted by barthes on Wed, 2007-06-27 15:45

Just to open an old thread - but I'm having an issue with coming up with a way to empty my cache folder. I guess that I must make a script that will be triggered by cron to make it run automatically, but the big question is how this script be made code-wise?

Hi, If you're able to setup

Submitted by support on Wed, 2007-06-27 17:22

Hi,

If you're able to setup a CRON process (which is the best way to do it) then rather than write a PHP script to do the job I would recommend simply entering an rm (remove) command directly as the CRON command. Something like this should do the trick:

rm -f /path/to/cachdir/*

Hope this helps!
Cheers,
David.

Good stuff David - works

Submitted by barthes on Thu, 2007-06-28 10:42

Good stuff David - works very well! thank you!