You are here:  » Caching


Caching

Submitted by Demarco on Wed, 2009-07-29 17:13 in

i got your cache script from here:
http://www.magicparser.com/node/136

u said to:
$mkdir cache
$chmod a+w cache

1. because i'm not quite understand SSH stuff, so i create a cache dir manually from my FTP software then what is "$chmod a+w" mean?.. what number i need to type to the chmod?.. ive chmod it to "755"

2. your original step is declaring $filename first, but because i will be using dynamic keyword isertion for multiple feed, i thought i can't use that.. so i use cacheFetch directly.. is it okay?.. from my testing its looks OK..

<?php
  
// fetch (if required)
  
$filename cacheFetch("http://www.example.com/feed.xml",86400);
  
// parse
  
MagicParser_parse($filename,"myRecordHandler");
?>

Here's my stcratch method will look like:

-----------------------------------------------------------

<?php
  
require("MagicParser.php");
  function 
cacheFetch($url,$age)
  {
    
// directory in which to store cached files
    
$cacheDir "cache/";
    
// cache filename constructed from MD5 hash of URL
    
$filename $cacheDir.md5($url);
    
// default to fetch the file
    
$fetch true;
    
// but if the file exists, don't fetch if it is recent enough
    
if (file_exists($filename))
    {
      
$fetch = (filemtime($filename) < (time()-$age));
    }
    
// fetch the file if required
    
if ($fetch)
    {
      
// shell to wget to fetch the file
      
exec("wget -N -O ".$filename." \"".$url."\"");
      
// update timestamp to now
      
exec("touch ".$filename);
    }
    
// return the cache filename
    
return $filename;
  }
  function 
myRecordHandler($record)
  {
    
// This is where you write your code to process each record, such as loading a database
    // You can display the record contents using PHP's internal print_r() function:
    
print_r($record);
    
// The following code will print out each field in your sample data:
    
print $record["ITEM"];
    print 
$record["TITLE"];
  }
  
MagicParser_parse(cacheFetch("http://news.google.com/news?pz=1&ned=us&hl=en&topic=h&num=3&output=rss&s=[Example+Dynamic+keyword+here]",86400),"myRecordHandler","xml|RSS/CHANNEL/ITEM/");
  function 
myRecordHandler($record)
  {
    
// This is where you write your code to process each record, such as loading a database
    // You can display the record contents using PHP's internal print_r() function:
    
print_r($record);
    
// The following code will print out each field in your sample data:
    
print $record["ITEM"];
    print 
$record["TITLE"];
  }
  
MagicParser_parse(cacheFetch("http://rss.news.yahoo.com/rss/topstories/?s=[Example+Dynamic+keyword+here]",86400),"myRecordHandler","xml|RSS/CHANNEL/ITEM/");
?>

-----------------------------------------------------

Submitted by support on Wed, 2009-07-29 17:35

Hi Demarco,

2) is fine, but for 1), the mode should be 777 otherwise PHP is not able to write to the directory - that should be all it is!

This is not a security risk because file system security should not be related to HTTP (web access) security.

Cheers,
David.

Submitted by Demarco on Mon, 2009-08-03 15:43

Hi Dave,

This caching script is great, but why many times it shows blank page?

i guess it because of the timeout.. ive increased the timeout to 75 seconds.. but still many times it shows blank page..

what sud i cahnge to make this script check the cache file if its 0kb/blank it refetch the page?..

and what sud i enter after the url to make it caching for unlimited time (forever)?

Thanks

Submitted by support on Mon, 2009-08-03 15:49

Hi Demarco,

I would not recommend reloading in-case the problem persists. For an unlimited, cache, you could simply use a value of something like 10 years rather than 1 day - so in place of 86400 use 315360000.

Cheers,
David.

Submitted by Demarco on Mon, 2009-08-03 16:05

Why is that?..

I think this is because when i want to submit my ads to yahoo, then yahoobot crawling all my dynamic keyword landing page at the same time.. i think because of that some page blank.. although i'm using VPS.. any other idea?

Thanks

Submitted by support on Mon, 2009-08-03 16:10

Hi,

I think that if the page hasn't loaded because of a problem on the remote server, it is likely that the problem would persist if you retried - which could potentially end up with an infinite loop - and whilst you could use a retry counter to make sure that you didn't retry indefinitely it's not a robust way to present your pages.

I would contact the feed provider if there is a reliability issue, as it shouldn't be the case that something you are trying to parse is so regularly unavailable that it is worth going to significant lengths to deal with an occasional outage...

Cheers,
David.,

Submitted by Demarco on Mon, 2009-08-03 16:30

I dont think its because of the feed provider..

i think its because i submit hundreds of ads (has its own dynamic landing page) at the same time.. so the yahoo bot crawl it at the same time also.. which give me great QS and low bid because all is relevant..

I'm sure its because of that coz some page give me result and some other not.. so i try to delete all the cache file, and reload the previously blank one, it shows result..

is there any way so its only check the blank cache file only once?

so the currently cachefetch works like this:

check whether there's a file for this url then if its not it will fetch the page, if ther's a cache page for it, it will show the cache feed.. this is how current cachefetch works rite?

how about changing it so:

check whether there's a file for this url & the size of this file, if theres no cache file / the size is zero, it will gather the url.. so from this point it will continue as normal isnt it? isnt it just adding the size checker code? sorry if i'm mistaken. coz php is not my area.. :(

Submitted by Demarco on Mon, 2009-08-03 16:31

just to add, all the feed source is using same source for every landing page..