[jdom-interest] Performance: JDOM2 and Saxon

Tue Oct 25 05:26:32 PDT 2011

Excellent. I can work with that, and the feedback is appreciated.

If nothing else, seeing the numbers creates something of a 'baseline'
against which we can set expectations.

Based on your response, and since you are the first 'users' of JDOM2
speaking up (thanks) perhaps some follow-up comments:
1. the XPath area (org.jdom2.xpath.*) is expected to be revised still
(issues #42 and #45). This may impact your work.
2. Have you identified any areas of JDOM2 code which are underperforming?
You mention the "navigational API's" can you narrow that down to any
particular iterators/methods?
3. In general is JDOM2 'better' to work with than JDOM1? Is it going in
the right direction? Do you even notice it?
4. Are there any other changes that would make your life easier
(API/etc.)?

Thanks

Rolf

On Tue, 25 Oct 2011 11:46:39 +0100, O'Neil Delpratt <oneil at saxonica.com>
wrote:
> Hi Rolf,
> 
> The intention of doing these experiments was not to suggest that we can 
> integrate Saxon with JDOM as a package, we recognize that this would 
> create questions around the licensing. We were primarily interested in 
> the performance of JDOM2 compared to JDOM1 using both the Saxon XPath 
> engine and JDOM's embedded XPath engine. We wanted to check that we can 
> do JDOM2 as well as JDOM1, and that the performance we get is 
> acceptable. We thought we'd let you know the results as they seem to be 
> interesting in the context of the JDOM2 project.
> 
> Answer to your questions/comments:
> 1) I don't think we're interfacing with JDOM at that level - we don't 
> attempt to make Saxon available using JDOM's APIs, only using Saxon's
APIs.
> 
> 2) Potentially - but I think there could be some difficulties because of

> the need to establish a Saxon Configuration. Using Saxon for individual 
> XPath requests without giving Saxon any context that's reused across 
> requests would probably perform badly.
> 
> 3) I confirm the tests on JDOM2 were done using the build after the 17th

> October, there including the changes made to the Iterator() for 
> ContentList. The tests confirm the results you had published on 
> http://hunterhacker.github.com/jdom/jdom2/performance.html
> 
> 4) Whitespace: not sure of the exact details here, but the general rule 
> for XPath 1.0 is that all whitespace is preserved unless otherwise 
> specified, whereas in XPath 2.0 it's DTD-sensitive - whitespace in 
> element-only content gets removed. We could do a performance comparison 
> that eliminated this potential source of differences, but I'm not sure 
> we would learn much more from it.
> 
> 5) I'm not sure this would be productive. Our focus is on running the 
> W3C XSLT and XQuery test suites and making sure that the results when 
> JDOM is used underneath match the expected results. (We've generally 
> only done this for a subset of the tests, and there tend to be some 
> differences in test results for different tree models, caused for 
> example because some models don't label nodes as IDs or IDREFs, some 
> don't expose unparsed entities, etc.)
> 
> regards,
> 
> Mike and O'Neil
> 
> 
> On 24/10/11 15:15, Rolf Lear wrote:
>> Hi Michael, O'Neil
>>
>> I simply have not looked in to Saxon yet, so I have no frame of
>> reference,
>> and bear with me on that as it will happen at some point...
>>
>> There is issue #34https://github.com/hunterhacker/jdom/issues/34  to
>> track
>> XSLTransform which I created in response to your suggestions for
Saxon...
>> and I do keep looking at it.
>>
>> My overall plan has 'always' been to:
>> 1. build a regression test system (junit testcases).
>> 2. build a performance regression test system (PerfTest)
>> 3. make changes for JDOM2 with confidence.
>>
>> Having built the 'PerfTest' process I've nailed down some of the
>> performance regressions I introduced, and followed the 'thread' of
>> changes
>> in to some other areas. It's a little 'aimless', but the current
'theme'
>> is
>> 'performance'.
>>
>> This is probably a mistake, I should be looking at 'structure' now that
I
>> have the (restored) performance baseline... but the 'performance' thing
>> is
>> always good, and I find it fun and challenging.
>>
>> The code is now 'ripe' for looking at structural changes though.
>>
>> Still, Saxon concerns me from a JDOM perspective because of the
>> dual-licensing with the 'restricted' free/open version, and the
>> 'complete'
>> commercial version.
>>
>> My personal feel for this sort of situation is that the solution from a
>> JDOM perspective is to keep the JDOM API open, and to make it
>> possible/easy
>> to use Saxon, but not to include either version of Saxon as the
'default
>> engine'. Specifically, I don't see JDOM as being an advertising
platform
>> for some commercial product. I know this sort of issue is
>> debatable/religious/etc. which is why it's important to understand that
I
>> am willing to defer to Jason's judgment on this one. For what it's
worth
>> the company I work for would would have to implement special protocol
>> handling for JDOM if it were to bundle the Saxon code.
>>
>> On the other hand, I really do appreciate your taking the time to look
in
>> to the integration of Saxon and JDOM.
>>
>> I have some comments/questions/suggestions:
>> 1. I changed the 'implementation' API of the XPath code when I worked
on
>> the jaxen bugs/issues. The intention was to make it easier (than
before)
>> to
>> have other engines (like Saxon). Did this change help you with your
>> tests?
>> Could it be done better?
>> 2. Is the integration 'glue' something that can be easily put in
>> org.jdom2.xpath.saxon ?
>> 3. I implemented new iterator() back-ends for ContentList which are
>> significantly faster than before in change 41217056 (17th Oct). Is your
>> test based on JDOM2 from before that? :
>>
https://github.com/hunterhacker/jdom/commit/412170566ebdf8449b442e44f12ed8712d447a19
>> Those changes should bring the hamlet.getDescendants() down to about
3ms
>> 4. The 'missing' Text nodes are significant.... I am surprised that
they
>> are absent? What is the logic for skipping them?
>> 5. Which leads to the question: How does the Saxon implementation fare
on
>> the unit tests? Can you create a Saxon version of:
>>
https://github.com/hunterhacker/jdom/blob/master/test/src/java/org/jdom2/test/cases/xpath/TestLocalJaxenXPath.java
>>
>> The 'snapshot' system I have started on the github pages is not very
>> useful for figuring out what's in the snapshot, and naming the
snapshot.
>> I
>> should fix that.
>>
>> But, the 'current' snapshots should have the improved iterator:
>>
http://hunterhacker.github.com/jdom/jdom2/snapshot/jdom-2.x-SNAPSHOT.jar
>>
>> It would be better though if you just pulled the latest code though
>> because there are a couple of other changes that would improve
>> performance
>> too.
>>
>> Thanks again
>>
>> Rolf
>>
>> On Mon, 24 Oct 2011 13:29:22 +0100, Michael Kay<mike at saxonica.com> 
>> wrote:
>>> My colleague O'Neil Delpratt has been doing some performance
experiments
>>> with JDOM1 and JDOM2. Here are the results he is getting.
>>>
>>>
>>>
>>> Experiment: I ran a somewhat simplified test harness on the same two
>>> XPath expression (i.e. "//@null" and "//node()") on the XML document
>>> hamlet.xml
>>>
>>> Results
>>> Average time taken over 50 runs, excluding the first run.
>>>
>>> JDOM1: 273.15ms
>>> JDOM2: 92.56ms
>>> Saxon (TinyTree treeModel): 2.8ms
>>> Saxon (JDOM treeModel): 10.36ms
>>> Saxon (JDOM2 treeModel): 10.82ms
>>>
>>> The # of tree nodes:
>>> Saxon: 12097
>>> Standalone JDOM(-2): 19840
>>>
>>> The difference in results was down to whitespace between elements
>>> represented as text nodes in JDOM(-2).
>>>
>>> So: JDOM2 is doing a good job relative to JDOM1, but the XPath engine
is
>>> still very slow compared to Saxon's XPath engine.
>>>
>>> The Saxon code for accessing JDOM2 uses the JDOM node.getDescendants()
>>> method rather than making recursive use of getChildren() as we do with
>>> JDOM1, and this benefits performance in that without this change, the
>>> JDOM2 code ran in 12.28ms; but we're still getting slightly slower
>>> results from JDOM2 despite this improvement.
>>>
>>> I believe the way the measurements were done causes the XPath
expression
>>> to be compiled once and executed repeatedly.
>>>
>>> The differences we are seeing from these results are:
>>>
>>> (a) The TinyTree is very fast when processing the descendant axis
>>> (because the nodes are held in an array in document order)
>>>
>>> (b) In the scenario where XPath compile time is amortized over many
>>> executions (the only case we've measured), the Saxon XPath engine is
>>> much faster than the one built in to JDOM.
>>>
>>> (c) JDOM2 is fractionally slower than JDOM1 in its navigational APIs,
>>> even though its XPath engine is now three times faster.
>>>
>>> Michael Kay
>>> Saxonica
>>> _______________________________________________
>>> To control your jdom-interest membership:
>>>
http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com