You are here:  » Parse multiple (20-40) very large files


Parse multiple (20-40) very large files

Submitted by Fatihpk on Sat, 2015-05-16 21:10 in

I am trying to parse several large xml files (each like 15MB)!

What is the best way to do this?

Submitted by support on Mon, 2015-05-18 08:18

Hello Fatihpk and welcome to the forum!

Magic Parser itself will have no problem with feeds of that size - in fact, that's exactly what it is designed for in scenarios where it is not possible to process entirely in RAM.

However, since the parsing in combination with whatever processing you require is likely to exceed the usual defaults for PHP's maximum execution time over an HTTP request, e.g. requested through a web browser, it is much better when working with large feeds to execute your script(s) from the command line, accessing your server / VPS using an SSH client such as the popular PuTTY program;

http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html

Either way, always start your scripts with:

  set_time_limit(0);

In terms of set-up, if you are processing different formats of feed I would recommend creating individual scripts for each format, e.g.

type1.php
type2.php
type3.php

Execution from the command line, once logged is a case of changing directory to the feed containing your scripts and running using the command line version of php e.g.

cd public_html/scripts/
php type1.php

etc.

Once all set-up and working, if you then wanted to create a single shell script to execute one, with a view to perhaps setting it up to run as a CRON job, create a .sh script as follows

all.sh

cd /home/username/public_html/scripts/
php type1.php
php type2.php
php type3.php

Use the pwd (present working directory) command in the scripts/ folder to get the value to use in the cd (change directory) command on the first line of your shell script - this is so that it can be executed as a CRON job without having to know what directory the script will be executed relative to.

After creating your shell script don't forget to mark as executable using

chmod +x all.sh

And to execute manually, simply enter

./all.sh

Hope this helps!
Cheers,
David
--
MagicParser.com