You are here:  » Wikipedia


Wikipedia

Submitted by globalguide on Mon, 2008-08-11 10:59 in

I want to parse a Wikipedia page. You can see this on the MagicParser demo pages:

Try inputing this URL:
http://en.wikipedia.org/wiki/Special:Export/Jennifer_Black
and using the Demo.

Works fine.

However, copy the PHP over and you get this message:

could not open "http://en.wikipedia.org/wiki/Special:Export/Jennifer_Black"

Has anybody got an idea what is going wrong?

Submitted by support on Mon, 2008-08-11 11:02

Hi,

This sounds like your server is not able to fopen() files by URL... There's some info and how to enable this (or what to ask your host) in this thread...

http://www.magicparser.com/node/189

That should be all it is - let me know if you need any more help or need to look at other ways of retrieving the remote document...

Cheers,
David.

Submitted by globalguide on Mon, 2008-08-11 11:24

Thanks David,

That fopen page might as well be in martian! I have no idea where to start with that. I do have access to my php.ini file if that is useful but not what to put into it.

I would only use the wikipedia Special:Pages export function here. Maybe it would help if I knew what

http://en.wikipedia.org/wiki/Special:Export/Jennifer_Black

actually resolves to as a filename.

http://en.wikipedia.org/wiki/Special:Export/Jennifer_Black.xml

Or something. Perhaps that would help, getting round the wrapper problem?

all the best

Scott

Submitted by support on Mon, 2008-08-11 11:34

Hi Scott,

Since you have access to php.ini, you should have permission to set an PHP configuration directive within your script itself. So as a first experiment, try adding the following line right at the top of your script (after the opening PHP tag):

ini_set("allow_url_fopen","1");

If that still doesn't work, then try the same setting by editing your php.ini. First search for "allow_url_fopen" (without the quotes) in the file, and if you find it, see what it is currently set to. If it is 0 or "FALSE", change the setting to "1". Otherwise, add the following line at the end:

allow_url_fopen = 1

Don't forget that you will need to restart PHP (or Apache if PHP is running as a module) before changes to php.ini take effect...

Cheers,
David.

Submitted by globalguide on Mon, 2008-08-11 11:39

allow_url_fopen = On
allow_url_include = On
session.use_only_cookies = 1
session.use_trans_sid = 0

Dear David,

This above is what PHP.INI had on this subject. I buy space on a remote server and they suggest changing PHP.INI to make this or that happen (with no need for rebooting etc.). Unfortunately they do not offer support on "programming issues".

So I added:

allow_url_include = 1
allow_url_fopen = 1

But that didn't change anything.

all the best

Scott

Submitted by support on Mon, 2008-08-11 11:45

Hi Scott,

Could you try this test script to confirm for sure that it is a URL wrappers problem. This simulates exactly what MagicParser.php does when trying to open a URL...

<?php
  $url 
"http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml";
  
$fp fopen($url,"r");
  if (
$fp)
  {
    print 
"Success";
  }
  else
  {
    print 
"fopen() Failed";
  }
?>

Cheers,
David.

Submitted by globalguide on Mon, 2008-08-11 11:57

Hi David,

We got "Success"!

all the best

Scott

Submitted by support on Mon, 2008-08-11 12:07

Hi Scott,

That's interesting, as it implies that URL wrappers are working fine, so that's not the problem. OK, next test is to try exactly the same code, but with the wikipedia URL....

<?php
  $url 
"http://en.wikipedia.org/wiki/Special:Export/Jennifer_Black";
  
$fp fopen($url,"r");
  if (
$fp)
  {
    print 
"Success";
  }
  else
  {
    print 
"fopen() Failed";
  }
?>

Cheers,
David.

Submitted by globalguide on Mon, 2008-08-11 12:18

Warning: fopen(http://en.wikipedia.org/wiki/Special:Export/Jennifer_Black) [function.fopen]: failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden in /home/globalgu/public_html/temp2.html on line 5

Submitted by support on Mon, 2008-08-11 12:24

Hi Scott,

As you can see from that error message, the web server refused to answer the request coming from your PHP script.

Now, this could be because of the user-agent if the remote server is trying to prevent fetching via PHP scripts. Try the following test script (adding a line to set the user agent at the top), which will indicate who you are which is considered polite when making automated requests such as this:

<?php
  ini_set
("user_agent""GlobalGuide/1.0");
  
$url "http://en.wikipedia.org/wiki/Special:Export/Jennifer_Black";
  
$fp fopen($url,"r");
  if (
$fp)
  {
    print 
"Success";
  }
  else
  {
    print 
"fopen() Failed";
  }
?>

Submitted by globalguide on Mon, 2008-08-11 12:56

Hi David,

That worked like a dream - that was all it was!!!!!

Thanks very much

Scott