[jdom-interest] Memory usage when processing a large file

Mon Oct 8 13:22:48 PDT 2001

Hi Dennis

Thanks for getting back to me. I have found out that Document object
returned by SAXBuilder is the "bottleneck", i.e. for an XML file of 750K the
Document object when serialised and converted into byte array is around 7MB
(i.e. ~7000 bytes). Also looking at the garbage collector
(using -verbose:gc) I saw that it cleans up roughly 10 times the size of the
file when invoked.

The file is fairly flat:
<!ELEMENT colours (row*)>
  <!ELEMENT row (aID,blue,green,yellow,red,bID?,cID?,grey?)>
    <!ELEMENT aID    (#PCDATA)>
    <!ELEMENT blue   (#PCDATA)>
    <!ELEMENT green  (#PCDATA)>
    <!ELEMENT yellow (#PCDATA)>
    <!ELEMENT red    (#PCDATA)>
    <!ELEMENT bID    (#PCDATA)>
    <!ELEMENT cID    (#PCDATA)>
    <!ELEMENT grey   (#PCDATA)>

But it has a lot of "row" elements, but none of the elements is too big,
that is to say that their CDATA content is small (a couple hundred
characters at most).

Best regards

Benjamin

> -----Original Message-----
> From: Dennis Sosnoski [mailto:dms at sosnoski.com]
> Sent: 08 October 2001 03:35
> To: Benjamin Kopic
> Cc: jdom-interest at jdom.org
> Subject: Re: [jdom-interest] Memory usage when processing a large file
>
>
>
> Hi Benjamin,
>
> I've run some tests using documents up to just over 1MB (nt.xml,
> the New Testament
> marked up with element wrappers for the text). The JDOM document
> took a little over
> 3MB of the Java heap, though I didn't look at total usage by the
> JVM (as seen by the
> system).
>
> Have you looked at how your memory usage scales for smaller
> documents? You might also
> try pausing the program at various points and see when your
> memory usage goes
> offscale. I'd personally suspect the database interface code more
> than JDOM, though,
> unless your documents are very unusual (lots of entities that
> expand to huge amounts
> of text, for instance).
>
>   - Dennis
>
> Benjamin Kopic wrote:
>
> > Hi
> >
> > We have an application that processes a data feed and loads it into a
> > database. It builds a JDom Document using SAXBuilder and
> Xerces, and then it
> > uses Jaxen XPath to retrieve data needed.
> >
> > The problem is that when we parse a 7MB feed the memory usage
> by Java jumps
> > to 110MB. Has anyone else used to process relatively large data
> feeds with
> > JDom?
> >
> > Best regards
> >
> > Benjamin Kopic
> > E: ben at kopic.org
> > W: www.kopic.org
> > T: +44 (0)20 7794 3090
> > M: +44 (0)78 0154 7643
> >
> > _______________________________________________
> > To control your jdom-interest membership:
> >
http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhos
t.com