[jdom-interest] Still more Verification

Elliotte Rusty Harold elharo at metalab.unc.edu
Wed Aug 23 05:00:01 PDT 2000


At 8:28 PM -0700 8/22/00, Jason Hunter wrote:
>I'm curious what people think about this approach.  What Elliotte's code
>does is ensure that you absolutely cannot create an non-well-formed XML
>document using JDOM.  That's a cool feature!
>
>My concern is that every change to the JDOM document is going to be
>checked char by char by char, resulting in a noticeable performance
>decrease.  Elliotte says he saw a 20% slowdown (not sure on what test).
>It's probably really bad for documents that are mostly text.
>

Just to be clear my initial, real-world tests showed a change that 
was down in the noise. I eventually was able to carefully construct 
some tests cases I designed to have worst case behavior that showed 
close to 20% slow down, but in most normal cases the cost would be 
much less than this.

In particular, any program where the actual XML construction is a 
relatively small fraction of the work, this probably wouldn't be 
significant. For instance, in the Fibonacci example I've used in 
several talks, almost all the time goes into calculating Fibonacci 
numbers. Very little is spent doing anything with JDOM. If you're 
working with a database, almost all your time will be spent waiting 
for the database to respond; almost none doing anything with JDOM. If 
you're reading a file from a network most of your time is waiting for 
the network. Very little is actually constructing the document with 
JDOM. Even if you're writing a document to disk, it's still true that 
more of your time will be spent waiting for the disk than doing JDOM 
work.

The only thing I'm not sure of is what happens with the JIT off. I 
only tested on JDK 1.3 with JIT. This is the exact sort of code that 
JITs extremely well. (a  simple loop with no I/O that repeats many 
times). I did not warm up the JIT before testing.

>We could perhaps find a way for SAXBuilder to avoid the slowdown by
>using some special constructor.  Problem there is that since builders
>are and should be in a different package than the core (because people
>should have the ability to write their own builders), we're going to
>have to expose those special constructors to the public at large, and
>that eliminates the ability to say you cannot create an non-well-formed
>JDOM document, because with those constructors you can.
>

Actually, you can. You subclass the relevant classes with non-public 
classes in the org.jdom.input package. These subclasses would 
override the relevant setData(), setValue(), etc. methods in the 
normal classes. We'd need to make sure all the constructors in the 
superclasses called setData()/setValue() rather than doing the checks 
directly, but that's not hard. In fact, I think that's how I've got 
most of them structured already.

>Is it worth a 20% performance on all element construction to sanity
>check the text content?  The answer is probably sometimes yes, sometimes
>no.  But how would one differentiate between the two?
>
>We have a similar issue already for checking tag names, PI content, and
>so on.  If the content has already passed through a parser like Xerces,
>checking again only wastes CPU cycles.  We haven't worried about it for
>things like checking tag names because it's relatively fast, but when
>you have a document that could have large amounts of text, do you really
>want to check every character one at a time against a matrix of legal
>characters?
>

Again, special non-public builder subclasses could easily omit these 
checks. I'd prefer not to write them until performance testing proved 
they were necessary though. Remember, in most programs less than 10% 
of your time will be spent on JDOM at all. Almost all programs have 
something much more significant they're actually doing most of the 
time. I'm not willing to sacrifice program correctness for a small 
amount of performance. If testing shows that this is a real issue, 
then I think there are some optimizations we can do to get better 
performance, but first I'd like to get it right; then worry about 
making it fast.

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
|                  The XML Bible (IDG Books, 1999)                   |
|              http://metalab.unc.edu/xml/books/bible/               |
|   http://www.amazon.com/exec/obidos/ISBN=0764532367/cafeaulaitA/   |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://metalab.unc.edu/javafaq/ |
|  Read Cafe con Leche for XML News: http://metalab.unc.edu/xml/     |
+----------------------------------+---------------------------------+



More information about the jdom-interest mailing list