[jdom-interest] Couple of issues I need help on, (memory, sax vs dom, etc)

Dennis Sosnoski dms at sosnoski.com
Thu Jul 18 14:02:56 PDT 2002

The in-memory representation of a document is (essentially) always going 
to be larger than the text document. This is mainly because the 
representation constructs a large number of objects corresponding to the 
structure of the XML document, though even plain text is generally at 
least doubled in size (because Java uses 2-byte characters for 
representing text).

The actual "memory multiplier" value that applies to your documents 
depends on how they're structured, but 8X is not out of line with what 
other people have seen in the past. There's some variation in memory 
efficiency between DOMs, JDOM, and dom4j. It's not enough to 
dramatically change the end result, though - you might get up to 10MB 
instead of 8MB, but you'd still run out of memory.

There are really only two ways to handle very large document sizes. If 
your processing can be done a piece at a time, where each piece of the 
file is always going to be small enough to fit easily in memory, you can 
use dom4j's ElementHandler interface, as Bob suggested. Jason has also 
mentioned "spanner" code in jdom-contrib which can be used to do similar 
processing with JDOM, though I haven't looked at this myself. If this 
doesn't help, the only other real alternative is to go with a parse 
event stream interface (SAX/SAX2, XMLPull, or custom), collecting 
whatever information you need as you go through the document (possibly 
writing it out to a temporary file or such).

  - Dennis

Duffey, Kevin wrote:

>2) java.lang.OutOfMemory:
>   This was one I just found last night. Scared me quite a bit. The app
>needs to allow multiple XML selections. Some XML files may be quite
>large, > 10MB in size, even up to 50MB or more. Now, for the most part
>this will rarely happen, but it is a potential scenario the app must be
>able to handle. When I start the JVM, I am not specifying any memory
>parameters. When I select an 8MB xml file, during the "validation"
>method I use (which I described above), it throws the out of memory
>exception. For an 8MB file??!! That does not make sense to me. JVMs
>start up with 64MB RAM usage. How does an 8MB XML file translate to out
>of memory. Now, the process I do is loop through all selected files. On
>each iteration, I create a new SAXBuilder object, and a new Document
>object. I would assume since at the end of each iteration I am done with
>the objects, they get GC'd at some point. So the next step of the app,
>which then parses the xml for "header" data, also creates a new
>SAXBuilder and Document object and discards it at the end, and so on.
>The final and 3rd step is to parse the XML again, getting the "body" of
>the xml data. The error is occurring during the first step. If I select
>a single XML 8MB or larger, or multiple XML that equal 8MB or more, I
>get the out of memory error. When I select 7 1MB files, its fine. As
>soon as I approach the 8MB in total size of all selected files, I am out
>of memory. It would seem to me that it is in my code, but since I loop
>through and create then discard the JDOM objects, I am at a loss as to
>why this is happening. It would also seem that a single file may end up
>using way more memory than is available, but again, this is not the case
>because selecting several smaller files that add up to over 8MB ends up
>doing the same thing! Lastly, is there a reason why the JVM does NOT use
>virtual memory when it runs out of its allocated memory? The main reason
>I ask this is that our client machines that this app will be installed
>on may only have 32MB of physical memory, and we can not have them
>upgrade. So, while I am sure the OS will "swap" memory with the JVM
>since the JVM starts up using 64MB RAM, why can the JVM itself not swap
>memory out to allow more than its startup? Or should I just start it up
>specifying 1GB of memory max, 64MB min, and leave it at that? Still,
>this is NOT resolving the issue. I don't want to just throw more memory
>at the problem, I want to know why it is not working and fix it.

More information about the jdom-interest mailing list