[jdom-interest] Performance: JDOM2 and Saxon

Tue Oct 25 03:46:39 PDT 2011

Hi Rolf,

The intention of doing these experiments was not to suggest that we can 
integrate Saxon with JDOM as a package, we recognize that this would 
create questions around the licensing. We were primarily interested in 
the performance of JDOM2 compared to JDOM1 using both the Saxon XPath 
engine and JDOM's embedded XPath engine. We wanted to check that we can 
do JDOM2 as well as JDOM1, and that the performance we get is 
acceptable. We thought we'd let you know the results as they seem to be 
interesting in the context of the JDOM2 project.

Answer to your questions/comments:
1) I don't think we're interfacing with JDOM at that level - we don't 
attempt to make Saxon available using JDOM's APIs, only using Saxon's APIs.

2) Potentially - but I think there could be some difficulties because of 
the need to establish a Saxon Configuration. Using Saxon for individual 
XPath requests without giving Saxon any context that's reused across 
requests would probably perform badly.

3) I confirm the tests on JDOM2 were done using the build after the 17th 
October, there including the changes made to the Iterator() for 
ContentList. The tests confirm the results you had published on 
http://hunterhacker.github.com/jdom/jdom2/performance.html

4) Whitespace: not sure of the exact details here, but the general rule 
for XPath 1.0 is that all whitespace is preserved unless otherwise 
specified, whereas in XPath 2.0 it's DTD-sensitive - whitespace in 
element-only content gets removed. We could do a performance comparison 
that eliminated this potential source of differences, but I'm not sure 
we would learn much more from it.

5) I'm not sure this would be productive. Our focus is on running the 
W3C XSLT and XQuery test suites and making sure that the results when 
JDOM is used underneath match the expected results. (We've generally 
only done this for a subset of the tests, and there tend to be some 
differences in test results for different tree models, caused for 
example because some models don't label nodes as IDs or IDREFs, some 
don't expose unparsed entities, etc.)

regards,

Mike and O'Neil

On 24/10/11 15:15, Rolf Lear wrote:
> Hi Michael, O'Neil
>
> I simply have not looked in to Saxon yet, so I have no frame of reference,
> and bear with me on that as it will happen at some point...
>
> There is issue #34https://github.com/hunterhacker/jdom/issues/34  to track
> XSLTransform which I created in response to your suggestions for Saxon...
> and I do keep looking at it.
>
> My overall plan has 'always' been to:
> 1. build a regression test system (junit testcases).
> 2. build a performance regression test system (PerfTest)
> 3. make changes for JDOM2 with confidence.
>
> Having built the 'PerfTest' process I've nailed down some of the
> performance regressions I introduced, and followed the 'thread' of changes
> in to some other areas. It's a little 'aimless', but the current 'theme' is
> 'performance'.
>
> This is probably a mistake, I should be looking at 'structure' now that I
> have the (restored) performance baseline... but the 'performance' thing is
> always good, and I find it fun and challenging.
>
> The code is now 'ripe' for looking at structural changes though.
>
> Still, Saxon concerns me from a JDOM perspective because of the
> dual-licensing with the 'restricted' free/open version, and the 'complete'
> commercial version.
>
> My personal feel for this sort of situation is that the solution from a
> JDOM perspective is to keep the JDOM API open, and to make it possible/easy
> to use Saxon, but not to include either version of Saxon as the 'default
> engine'. Specifically, I don't see JDOM as being an advertising platform
> for some commercial product. I know this sort of issue is
> debatable/religious/etc. which is why it's important to understand that I
> am willing to defer to Jason's judgment on this one. For what it's worth
> the company I work for would would have to implement special protocol
> handling for JDOM if it were to bundle the Saxon code.
>
> On the other hand, I really do appreciate your taking the time to look in
> to the integration of Saxon and JDOM.
>
> I have some comments/questions/suggestions:
> 1. I changed the 'implementation' API of the XPath code when I worked on
> the jaxen bugs/issues. The intention was to make it easier (than before) to
> have other engines (like Saxon). Did this change help you with your tests?
> Could it be done better?
> 2. Is the integration 'glue' something that can be easily put in
> org.jdom2.xpath.saxon ?
> 3. I implemented new iterator() back-ends for ContentList which are
> significantly faster than before in change 41217056 (17th Oct). Is your
> test based on JDOM2 from before that? :
> https://github.com/hunterhacker/jdom/commit/412170566ebdf8449b442e44f12ed8712d447a19
> Those changes should bring the hamlet.getDescendants() down to about 3ms
> 4. The 'missing' Text nodes are significant.... I am surprised that they
> are absent? What is the logic for skipping them?
> 5. Which leads to the question: How does the Saxon implementation fare on
> the unit tests? Can you create a Saxon version of:
> https://github.com/hunterhacker/jdom/blob/master/test/src/java/org/jdom2/test/cases/xpath/TestLocalJaxenXPath.java
>
> The 'snapshot' system I have started on the github pages is not very
> useful for figuring out what's in the snapshot, and naming the snapshot. I
> should fix that.
>
> But, the 'current' snapshots should have the improved iterator:
> http://hunterhacker.github.com/jdom/jdom2/snapshot/jdom-2.x-SNAPSHOT.jar
>
> It would be better though if you just pulled the latest code though
> because there are a couple of other changes that would improve performance
> too.
>
> Thanks again
>
> Rolf
>
> On Mon, 24 Oct 2011 13:29:22 +0100, Michael Kay<mike at saxonica.com>  wrote:
>> My colleague O'Neil Delpratt has been doing some performance experiments
>> with JDOM1 and JDOM2. Here are the results he is getting.
>>
>>
>>
>> Experiment: I ran a somewhat simplified test harness on the same two
>> XPath expression (i.e. "//@null" and "//node()") on the XML document
>> hamlet.xml
>>
>> Results
>> Average time taken over 50 runs, excluding the first run.
>>
>> JDOM1: 273.15ms
>> JDOM2: 92.56ms
>> Saxon (TinyTree treeModel): 2.8ms
>> Saxon (JDOM treeModel): 10.36ms
>> Saxon (JDOM2 treeModel): 10.82ms
>>
>> The # of tree nodes:
>> Saxon: 12097
>> Standalone JDOM(-2): 19840
>>
>> The difference in results was down to whitespace between elements
>> represented as text nodes in JDOM(-2).
>>
>> So: JDOM2 is doing a good job relative to JDOM1, but the XPath engine is
>> still very slow compared to Saxon's XPath engine.
>>
>> The Saxon code for accessing JDOM2 uses the JDOM node.getDescendants()
>> method rather than making recursive use of getChildren() as we do with
>> JDOM1, and this benefits performance in that without this change, the
>> JDOM2 code ran in 12.28ms; but we're still getting slightly slower
>> results from JDOM2 despite this improvement.
>>
>> I believe the way the measurements were done causes the XPath expression
>> to be compiled once and executed repeatedly.
>>
>> The differences we are seeing from these results are:
>>
>> (a) The TinyTree is very fast when processing the descendant axis
>> (because the nodes are held in an array in document order)
>>
>> (b) In the scenario where XPath compile time is amortized over many
>> executions (the only case we've measured), the Saxon XPath engine is
>> much faster than the one built in to JDOM.
>>
>> (c) JDOM2 is fractionally slower than JDOM1 in its navigational APIs,
>> even though its XPath engine is now three times faster.
>>
>> Michael Kay
>> Saxonica
>> _______________________________________________
>> To control your jdom-interest membership:
>> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com

-- 
O'Neil Delpratt
Software Developer, Saxonica Limited
Email: oneil at saxonica.com <mailto:oneil at saxonica.com>
Tel: +44 118 946 5894
Web: http://www.saxonica.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.jdom.org/pipermail/jdom-interest/attachments/20111025/04efa826/attachment.html>