You are here:  » Question


Question

Submitted by stebbo on Sat, 2007-07-07 05:07 in

Hello All,

I am new to the world of PHP although I have been programming for many years including some unix scripting many years ago.

I need to parse XML files structured like this
http://www.puntingace.net/aap/race_20070705_161301.xml

and to create
- separate files for each story (there are 3 in that link)
- a headlines file for inclusion on a home page

I have tried other parsers but they all seem to fail because of the multi-level structure, with the actual story being a level down in the BODYTEXT child node as separate P tags.

Having read through the "looping through child object" topic a few down and running it through the demo page, it looks as though this parser should be able to do it.

Question 1. Will it do what I want it to do?

Question 2. I need to be able to put the output from that XML file into separate html files for my website, which conform to the look and feel. I'm assuming I can do this with header and footer pages?

Question 3. Is there an easier way than header and footer files? Will a php include directive achieve what I need?

Question 4 (and last). Is there an easier way to do what I want. The XML files are pushed onto my webserver at times throughout the day. I need to parse them and get the headlines and update the page with the headlines on them. Am I on the right track here?

Much thanks for any assistance.

Cheers,
Chris.

Submitted by support on Sat, 2007-07-07 07:15

Hello Chris,

Thank you for your interest in Magic Parser.

I think it will do what you want - with a little bit of the appropriate PHP to create files and your story index etc. (which isn't parsing related functionality) - but it's quite straight forward to do so i've written some examples to help get you started. Note of course that there are plenty of other ways to achieve this (as i'm sure you'd appreciate being a Unix programmer!) such as loading the stories into a database, however here's a basic method using PHP's file handling functions to do what (I think!) you want to do...

Firstly, regarding the basic parsing of your XML file - yes, Magic Parser will work fine with this. The key thing you need to know is that the format string for use with your xml is as follows:

xml|BULLETIN/STORY/

To use this format string, together with the "looping through child objects" technique that you have already discovered, here is a very basic script to parse your XML and print out each story:

Output:
http://www.magicparser.com/examples/storyboard/storyboard.php

storyboard.php

<?php
  
require("MagicParser.php");
  function 
myRecordHandler($record)
  {
    print 
"<h1>".$record["HEADLINE"]."</h1>";
    
$i 0;
    while(
1) {
      if (
$i$postfix "@".$i;
      if (!isset(
$record["BODYTEXT/P".$postfix])) break;
      print 
"<p>".$record["BODYTEXT/P".$postfix]."</p>";
      
$i++;
    }
  }
  
MagicParser_parse("race_20070705_161301.xml","myRecordHandler","xml|BULLETIN/STORY/");
?>

Now, the reason why I have given this example its own sub-directory is because within that directory I have created a sub-directory called "stories", and given the web server process write access to that directory. This is so that we can extend the above script to extract each story and write it into an HTML file in the stories directory. In this example, I've chosen storyname as the base filename, as this seems to be some kind of ID field. The script first checks to see if the file exists, and creates the new story if not.

Finally, it appends a link to the story to a index file called "storyindex.php". This is so that you can include the story index within the index page for the directory, which I'll come onto in a moment...

makefiles.php

<?php
  
require("MagicParser.php");
  function 
myRecordHandler($record)
  {
    
$filename "stories/".$record["STORYNAME"].".html";
    
// see if we have already extracted this story, abandon if so
    
if (file_exists($filename)) return;
    
// create and open the file for write access
    
$fp fopen($filename,"w");
    
// write the story file - here you could print additional header HTML
    
fwrite($fp,"<h1>".$record["HEADLINE"]."</h1>");
    
// loop through the paragraphs and write each to the file
    
$i 0;
    while(
1) {
      if (
$i$postfix "@".$i;
      if (!isset(
$record["BODYTEXT/P".$postfix])) break;
      
fwrite($fp,"<p>".$record["BODYTEXT/P".$postfix]."</p>");
      
$i++;
    }
    
// close the file, but before this you could print additional footer HTML
    
fclose($fp);
    
// open storyindex.php for append/write access
    
$fp fopen("storyindex.php","a");
    
// write the link to this file, using the headline as the anchor text
    
$link "<p><a href='".$filename."'>".$record["HEADLINE"]."</a></p>";
    
fwrite($fp,$link);
    
// close the index file
    
fclose($fp);
  }
  
MagicParser_parse("race_20070705_161301.xml","myRecordHandler","xml|BULLETIN/STORY/");
?>

Finally, a master index (what better file to use that index.php, the default index for a directory), is required to include in your storyindex.php file to display links to the extracted stories.

index.php

<?php
  
print "<html>";
  print 
"<body>";
  print 
"<h1>Choose a story...</h1>";
  require(
"storyindex.php");
  print 
"</body>";
  print 
"</html>";
?>

Here's the link to the example directory to see the results in action:
http://www.magicparser.com/examples/storyboard/

Notice that this example doesn't use header or footer files, but check the comments in the makefiles.php for where you could bring in a standard header and footer for each story, but as before there are plenty of other ways to do this.

With regards to scheduling your script to run, you would need to look at using something like CRON to periodically run your makefiles script to look for a new XML file that has been pushed to you.

Hope this helps!
Cheers,
David
--
MagicParser.com

Submitted by stebbo on Sat, 2007-07-07 07:45

Hi David,

thanks very much for your response. It looks to be exactly what I'm after.

Cheers,
Chris.