[jdom-interest] Huge slowdown when reading > 15 xml files

philip.nelson at omniresources.com philip.nelson at omniresources.com
Mon Oct 22 18:19:01 PDT 2001

> I have a program that reads in xml files from a directory and 
> builds them
> into jdom documents.
> As it does .build(filename), the system gets slower and 
> slower.  After about
> 18 xml files it basically stops working and hangs.

I wanted to at least let you know I had taken a look at this and confirmed
what you are experiencing.  I have a couple of loose ends to follow up on
but it appears these documents pick on a truly worst case scenario for JDOM
because of how xerces slices up these documents.  Yes, you have all the <
and > characters escaped but xerces is still slicing this up in to massive
numbers of calls to characters(char[], int, int) in the SAX ContentHandler.
That is unfortunate but there is nothing JDOM can do about it, at least that
I am aware of.  What is even more unfortunate is that each of these calls
puts a new String into an ArrayList, most of which are 1 to 5 characters in
length.  As you can imagine, dividing a 100K document into 1-5 character
long individual strings is not the most efficient way to build it =8^(.
This we can improve on though and I think we should.

Would you be willing to do a little coding for JDOM?  I have a plan but not
so much time and other things to do for JDOM that probably affect more users
than this.  In short, my plan would be to slightly modify the api so that
adjacent element content is appended to the the last String content node,
rather than producing a new element in the list.  If other mixed content is
added, the next call to addContent(String) would start a new node.

More information about the jdom-interest mailing list