You are here:  » How to parse a 2GB+ XML file


How to parse a 2GB+ XML file

Submitted by mastereyes on Sun, 2009-09-06 18:47 in

Hi,

I have few XML files which are very large in size. You can say 2GB+. My hosting server is allowed me to use 512M memory. The smaller files from the size of 512M are working fine but the larger files are giving me an error of out of memory and ask me to increase it. Is there any way to parse the larger files without increasing the memory size?

Regards,

Matt

Submitted by support on Sun, 2009-09-06 18:57

Hello Matt,

Are you using a Format String in your call to MagicParser_parse()? Although an optional parameter, this will be essential with such large files otherwise the auto-detection mechanism will have to read the entire file and this will almost certainly be what is causing the out of memory error.

Details of the Format String can be found at:

http://www.magicparser.com/node/22

If you're not sure what is required for your large XML files, would you be able to post an example containing everything up to and including the first record? If not; alternatively you could email me a link to the file if that's possible, and I'll download just the first few 100K and then work out the Format String for you...

Hope this helps!
Cheers,
David.

Submitted by mastereyes on Sun, 2009-09-06 20:08

Hi David,

Thank you for your quick reply.

Here is the XML file content:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE product_catalog SYSTEM "http://www.domainname.com/downloads/tech/dtd/product_catalog_1_1.dtd">
<catalog>
<product>
<programname>Books Magazine</programname>
<programurl>http://www.domainname.com</programurl>
<catalogname>Books Product Catalog</catalogname>
<lastupdated>09/05/2009</lastupdated>
<name>Complete 58-Volume Softcover Books Set</name>
<keywords>Books, magazines, kids, children, gifts, babies, toddler, animals, poster, dvd, zoobooks, zootles, teacher, education, classroom, parents, grandparents, birthday, holiday, girls, boys, tween, school, elementary, reading, science, toys, games, free, wildlife, nature</keywords>
<description>Like an animal encyclopedia, this set is ready and waiting to answer virtually every creature-question your child can come up with?using simple terms and striking illustrations that are conducive to good grades.</description>
<sku>5258</sku>
<currency>USD</currency>
<saleprice>0.00</saleprice>
<price>184.99</price>
<retailprice>0.00</retailprice>
<buyurl>http://www.domainname.com/click-34637879-1053445?url=http%3A%2F%2Fwww.domainname.com%2Fstore%2FComplete-58-Volume--Softcover-Books-Set-P2C1.aspx</buyurl>
<impressionurl>http://www.domainname.net/image-3463879-10539514</impressionurl>
<imageurl>http://www.domainname.com/store/images/sets.jpg</imageurl>
<instock>YES</instock>
</product>
<product>
<programname>Books Magazine</programname>
<programurl>http://www.domainname.com</programurl>
<catalogname>Books Product Catalog</catalogname>
<lastupdated>09/05/2009</lastupdated>
<name>Alligators &amp; Crocodiles</name>
<keywords>Books, magazines, kids, children, gifts, babies, toddler, animals, poster, dvd, zoobooks, zootles, teacher, education, classroom, parents, grandparents, birthday, holiday, girls, boys, tween, school, elementary, reading, science, toys, games, free, wildlife, nature</keywords>
<description>The Crocodile Hunter has made them famous?now is your chance to fill kids in on their ancient past, their danger to humans, and their well-deserved reputation as doting parents. Available in English and Spanish.</description>
<sku>25-9</sku>
<currency>USD</currency>
<saleprice>0.00</saleprice>
<price>3.99</price>
<retailprice>0.00</retailprice>
<buyurl>http://domainname.com%2Fstore%2FAlligators-%26-Crocodiles-P4C1.aspx</buyurl>
<impressionurl>http://www.domainname.net/image-3463879-10539514</impressionurl>
<imageurl>http://www.domainname.com/store/images/alligat.jpg</imageurl>
<instock>YES</instock>
</product>

I'm using the Magic Parser example parameters, i.e., MagicParser_parse("file1.xml","myRecordHandler");

I will highly appreciate if you can give me an example to modify this line of code.

Regards,

Matt

Submitted by support on Mon, 2009-09-07 08:11

Hello Matt,

Based on the example, the call to MagicParser_parse() should be as follows:

MagicParser_parse("file1.xml","myRecordHandler","xml|CATALOG/PRODUCT/");

Hope this helps!
Cheers,
David.

Submitted by mastereyes on Mon, 2009-09-07 12:04

Hi David,

I think "|" is missing in MagicParser_parse("file1.xml","myRecordHandler","xmlCATALOG/PRODUCT/");

I have changed it with MagicParser_parse("file1.xml","myRecordHandler","xml|CATALOG/PRODUCT/");

Now its work fine. Thank you.

Regards,

Matt

Submitted by support on Mon, 2009-09-07 12:10

Hi Matt,

My apologies yes it was missing - I've corrected my post above...

Cheers,
David.