[jdom-interest] Re: [XOM-interest] Possible ArrayList optimization

Bradley S. Huffman hip at a.cs.okstate.edu
Fri May 9 18:10:35 PDT 2003


(X-posted to jdom-interest since the idea I the end could apply to both)

Elliotte Rusty Harold writes:

> Internally XOM uses ArrayLists to hold both children and attributes. 
> While debugging an unrelated test failure (I hate EBCDIC!) I noticed 
> that these array lists are created by default with a capacity of 10. 
> That means each Element object has at least 80 bytes of overhead that a 
> lot of the time it doesn't need. I'm going to experiment with setting 
> the initial capacity of both the elements and the attributes lists to 0 
> and 1 and see if that cuts down on memory usage noticeably.

5 for content is currently what JDOM uses (and IIRC dom4j), and seems to
be a decent number.

> Right now my memory tests are very rough. I have a couple of programs 
> that run out of memory, and I want to see if i can get them to actually 
> run without dumping the VM.
> 
> I am a little worried that dropping the capacity might impact speed, so 
> if anyone knows more about array list opitmization, please pipe up!

I'm assuming you mean spped when building from a SAX source. First in XOM,
get rid of the parent stack, it's not needed, you can call getParent() in
endElement(...).  Also I think you can add attributes and namespace
declarations in one loop instead of two.

Now obviously, what would be ideal is to know the number of children at
element creation, but that's not going to happen with SAX. So here's a
wild idea, have a large buffer to hold newly created nodes.  In all node
creation methods (startElement, comment, flush*(), etc.) buffer the newly
created node but **don't** add it to the current parent (as JDOM and XOM 
currently do), instead just increment the child count of the current parent
(either find some way to use the parent to hold this info, or stack it).
Finally in endElement(...) create the children list in the current
parent according to the current child count and move nodes from the buffer
to the parent. This should eliminate most unnecessary reallocation and 
copying of nodes. Hmmm, guess it would also eliminate the wasted overhead :)

Con side is you would need to make somethings things "protected" instead of 
"private", and/or add a couple protected methods.  But it would be interesting
to see if it would speed things up enough to warrent the change.

Brad



More information about the jdom-interest mailing list