[jdom-interest] How to manipulate a very large XML file? Any suggestions?

Tue Feb 10 09:22:30 PST 2004

Elliotte Rusty Harold wrote:

> At 11:06 AM -0800 2/9/04, Jason Hunter wrote:
> 
>> I use XQuery.  The language lets you do amazing things with XML and if 
>> you can get the right implementation it can be extremely fast. In my 
>> day job I'm able to quickly extract data from gigs of XML to produce 
>> reports.  This of course relies on an indexed engine (not just reading 
>> off the filesystem).
> 
> The *language* isn't that incredible. It doesn't let you do anything 
> that couldn't be done with XSLT. 

We could start a long thread on that topic, but that would be more 
appropriate on the mailing list at xquery.com.  I'll say that for me, 
the major difference beyond technical capabilities is that XQuery -- 
because it's *not* written in XML -- makes it easier to write robust and 
shorter programs and not just file format conversions.

 > What you're really suggesting is using
> a file-backed data store for situations where the document size exceeds 
> available memory. 

Actually an optimized native XML repository.

> However, this could also be done with SQL and a 
> relational database or a custom file format written in Java. 

Shredding XML into tables is not an enjoyable task and does not make for 
efficient queries.  You're trying to fit a jagged peg into a square hole.

And writing a custom file format in Java?  Let's compare techniques 
sometimes, Rusty.  You'll do what you propose, and I'll use a real 
XQuery engine.  We'll throw in about 10 Gigs of data and see who can 
make the data dance.  :-)

-jh-