[jdom-interest] Performance: JDOM2 and Saxon

Mon Oct 24 07:15:18 PDT 2011

Hi Michael, O'Neil

I simply have not looked in to Saxon yet, so I have no frame of reference,
and bear with me on that as it will happen at some point...

There is issue #34 https://github.com/hunterhacker/jdom/issues/34 to track
XSLTransform which I created in response to your suggestions for Saxon...
and I do keep looking at it.

My overall plan has 'always' been to:
1. build a regression test system (junit testcases).
2. build a performance regression test system (PerfTest)
3. make changes for JDOM2 with confidence.

Having built the 'PerfTest' process I've nailed down some of the
performance regressions I introduced, and followed the 'thread' of changes
in to some other areas. It's a little 'aimless', but the current 'theme' is
'performance'.

This is probably a mistake, I should be looking at 'structure' now that I
have the (restored) performance baseline... but the 'performance' thing is
always good, and I find it fun and challenging.

The code is now 'ripe' for looking at structural changes though.

Still, Saxon concerns me from a JDOM perspective because of the
dual-licensing with the 'restricted' free/open version, and the 'complete'
commercial version.

My personal feel for this sort of situation is that the solution from a
JDOM perspective is to keep the JDOM API open, and to make it possible/easy
to use Saxon, but not to include either version of Saxon as the 'default
engine'. Specifically, I don't see JDOM as being an advertising platform
for some commercial product. I know this sort of issue is
debatable/religious/etc. which is why it's important to understand that I
am willing to defer to Jason's judgment on this one. For what it's worth
the company I work for would would have to implement special protocol
handling for JDOM if it were to bundle the Saxon code.

On the other hand, I really do appreciate your taking the time to look in
to the integration of Saxon and JDOM.

I have some comments/questions/suggestions:
1. I changed the 'implementation' API of the XPath code when I worked on
the jaxen bugs/issues. The intention was to make it easier (than before) to
have other engines (like Saxon). Did this change help you with your tests?
Could it be done better?
2. Is the integration 'glue' something that can be easily put in
org.jdom2.xpath.saxon ?
3. I implemented new iterator() back-ends for ContentList which are
significantly faster than before in change 41217056 (17th Oct). Is your
test based on JDOM2 from before that? :
https://github.com/hunterhacker/jdom/commit/412170566ebdf8449b442e44f12ed8712d447a19
Those changes should bring the hamlet.getDescendants() down to about 3ms
4. The 'missing' Text nodes are significant.... I am surprised that they
are absent? What is the logic for skipping them?
5. Which leads to the question: How does the Saxon implementation fare on
the unit tests? Can you create a Saxon version of:
https://github.com/hunterhacker/jdom/blob/master/test/src/java/org/jdom2/test/cases/xpath/TestLocalJaxenXPath.java

The 'snapshot' system I have started on the github pages is not very
useful for figuring out what's in the snapshot, and naming the snapshot. I
should fix that.

But, the 'current' snapshots should have the improved iterator:
http://hunterhacker.github.com/jdom/jdom2/snapshot/jdom-2.x-SNAPSHOT.jar

It would be better though if you just pulled the latest code though
because there are a couple of other changes that would improve performance
too.

Thanks again

Rolf

On Mon, 24 Oct 2011 13:29:22 +0100, Michael Kay <mike at saxonica.com> wrote:
> My colleague O'Neil Delpratt has been doing some performance experiments

> with JDOM1 and JDOM2. Here are the results he is getting.
> 
> 
> 
> Experiment: I ran a somewhat simplified test harness on the same two 
> XPath expression (i.e. "//@null" and "//node()") on the XML document 
> hamlet.xml
> 
> Results
> Average time taken over 50 runs, excluding the first run.
> 
> JDOM1: 273.15ms
> JDOM2: 92.56ms
> Saxon (TinyTree treeModel): 2.8ms
> Saxon (JDOM treeModel): 10.36ms
> Saxon (JDOM2 treeModel): 10.82ms
> 
> The # of tree nodes:
> Saxon: 12097
> Standalone JDOM(-2): 19840
> 
> The difference in results was down to whitespace between elements 
> represented as text nodes in JDOM(-2).
> 
> So: JDOM2 is doing a good job relative to JDOM1, but the XPath engine is

> still very slow compared to Saxon's XPath engine.
> 
> The Saxon code for accessing JDOM2 uses the JDOM node.getDescendants() 
> method rather than making recursive use of getChildren() as we do with 
> JDOM1, and this benefits performance in that without this change, the 
> JDOM2 code ran in 12.28ms; but we're still getting slightly slower 
> results from JDOM2 despite this improvement.
> 
> I believe the way the measurements were done causes the XPath expression

> to be compiled once and executed repeatedly.
> 
> The differences we are seeing from these results are:
> 
> (a) The TinyTree is very fast when processing the descendant axis 
> (because the nodes are held in an array in document order)
> 
> (b) In the scenario where XPath compile time is amortized over many 
> executions (the only case we've measured), the Saxon XPath engine is 
> much faster than the one built in to JDOM.
> 
> (c) JDOM2 is fractionally slower than JDOM1 in its navigational APIs, 
> even though its XPath engine is now three times faster.
> 
> Michael Kay
> Saxonica
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com