[jdom-interest] Huge slowdown when reading > 15 xml files
Bradley S. Huffman
hip at a.cs.okstate.edu
Tue Oct 23 10:39:29 PDT 2001
philip.nelson at omniresources.com writes:
> I wanted to at least let you know I had taken a look at this and confirmed
> what you are experiencing. I have a couple of loose ends to follow up on
> but it appears these documents pick on a truly worst case scenario for JDOM
> because of how xerces slices up these documents. Yes, you have all the <
> and > characters escaped but xerces is still slicing this up in to massive
> numbers of calls to characters(char, int, int) in the SAX ContentHandler.
> That is unfortunate but there is nothing JDOM can do about it, at least that
> I am aware of. What is even more unfortunate is that each of these calls
> puts a new String into an ArrayList, most of which are 1 to 5 characters in
> length. As you can imagine, dividing a 100K document into 1-5 character
> long individual strings is not the most efficient way to build it =8^(.
> This we can improve on though and I think we should.
The problem is Element.addContent( String) concatenates adjacent String
elements. On c_large_123k_1.xml that came out to 24770 String appends.
Commenting out all code in org.jdom.Element.addContent(String)
except the last line "content.add( data)" yielded a dramatic speed up on
par with the a_small_*.xml test cases, but at the cost of
content.size() > 24,000.
For specialized cases like this, I like using the power of extension and
would extend DefaultJDOMFactory and Element (redefining addContent) to
fit my needs.
Hmmm, or maybe a specialized JDOMFactory with a method
setLargeTextContent( String element_name), along with other methods to set
which elements need specialized handling.
More information about the jdom-interest