[jdom-interest] JAXP performance problems

Klotz, Leigh Leigh.Klotz at xerox.com
Fri Sep 15 13:29:22 PDT 2006


I've recently found two JAXP performance problems.  I've filed a bug
with Sun on them.
I've seen these calls in the JDOM source and want to call attention to
them.
When used in conjunction with Tomcat, the problems are particularly
visible.

1. Everybody does
DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument()
and DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(x).

This is bad and slow.
An application should construct a very limited number of calls to
DocumentBuilderFactory.().newDocumentBuilder().
(See piints #2 and #3 below.)

In JDK 1.5, the thread-safety warning on
DocumentBuilderFactory.newDocumentBuilder() and
DocumentBulder.newDocument() has gone away, and they've added a reset
method, so once you have DocumentBuilderFactory you should keep onto it.
For JDK 1.4 and earlier compatible code, it's probably necessary to
synchronize this call.

If you have a way to associate a DocumentBuilder object (the result of
DocumentBuilderFactory.newInstance().newDocumentBuilder()) with a thread
(e.g. a web session) then you should re-use it and keep calling
.newdocument and .parse() on it as long as your code isn't re-entrant in
the same thread.  If it is, you may need a pool or other strategies to
avoid performance problems.

The same goes for the JAXP Sax factories.

The case of SAXBuilder parse is more difficult because it's harder to
deal with the lifetime of the SAX objects.

So it may make sense to offer some kind of context or other object to
pass in to the DOMBuilder and SAXBulder constructors to allow this kind
of re-use.  I'm not sure what to do exactly.

2. DocumentBuilderFactory.newInstance() is slow and bad.
It does an explicit walk of all JAR files applying three different
strategies to find an class to instantiate.
The code in the javax.xml.parses package has internal debug routines
that access the classloaders in the context and call .toString() on
them, ignoring the result unless a debug flag is set (yes, it is even
documented in the JavaDoc to call System.err.println).

In Tomcat 4.1.x and 5.5, this operation is incredibly expensive, because
org.apache.catalina.loader.WebappClassLoader.toString() has a highly
detailed definition that produces about a 20K character string, which it
then copies over again using StringBuffer, resulting in about a
tremendous amount of useless calculation and memory allocation.

3. documentBuilder.newDocument() also goes through this pathway and does
toString on the classloader as well.


Leigh.



More information about the jdom-interest mailing list