From jdom at tuis.net Sun Oct 2 18:36:30 2011 From: jdom at tuis.net (Rolf) Date: Sun, 02 Oct 2011 21:36:30 -0400 Subject: [jdom-interest] JDOM2 Update. In-Reply-To: <4E76BDD7.7050505@tuis.net> References: <4E76BDD7.7050505@tuis.net> Message-ID: <4E89119E.5010903@tuis.net> Hi All Another update. This has been a busy spell since the last update. JDOM 1.1.2 ========== This is ready to go, and has been going through the final stages of publishing it. You can see the change-log by either inspecting the Changes file or the issues list: https://github.com/hunterhacker/jdom/blob/jdom-1.1.2/core/CHANGES.txt https://github.com/hunterhacker/jdom/issues?labels=backport+1.1.2+done&sort=created&direction=desc&state=closed&page=1 In summary, there have been 14 bug fixes, and the Jar will be available on maven-central. The JDOM 1.x branch will remain open for bug-fixes only. If you want a sneak-peak of the 1.1.2 code you can download the source at: https://github.com/hunterhacker/jdom/zipball/jdom-1.1.2 JDOM 2 ====== There are two major news items here, and an anticipated plan: 1. the jUnit testing has close to complete coverage, with only some unreachable code, and some quirky exception cases being missed; 2. Basic Generics changes have been made. 3. Putting together a JDOM2 plan Unit Tests: ----------- There are some failing tests, and some ignored tests. The failing tests relate two one of two things: - there is a bug in JDOM where multiple consecutive Text content instances are not processed correctly... see https://github.com/hunterhacker/jdom/issues/31 I have ignored some tests, and left one failing until this gets resolved. - The Jaxen code has a bug with respect to the ordering of Attribute and Namespace nodes, see http://jira.codehaus.org/browse/JAXEN-215 It means that for Attributes and Namespaces the XPath node set is not returned in Document order, and the Test cases expect that, so currently they fail. (I have a 'patched' version of Jaxen in my environment and the tests pass). Generics: --------- I have done a first pass at Generics for the code. The intention for this was was to do a cleanup of warnings and to get a baseline of a 'simple' JDOM that's 'neat'. See the conclusion for how to get the code. I have made only one significant API change which is to substantially extend the 'Filter' interface, and implementing classes. This has allowed for a completely 'clean' JDOM2 code base. The Filter API change is backward-compatible (unless you happen to have your own implementation of the Filter interface), and as a result, you should be able to do a drop-in replacement of the current JDOM2 code with your existing code (except you will have to change all the org.jdom.* imports to org.jdom2.* At this point the code is as close as possible to being a 'minimum' JDOM2: it is JDOM with Generics, plus a minimum amount of spice on the Filter API to make the getContent(Filter) stuff work. See the 'Conclusion' for how to get the code. Planning: --------- At this point The code is 'ripe' for ideas. The regression test harness is comprehensive, the code is close to 'clean', and yet it is all still very familiar to anyone familiar with JDOM. There is one concern I have with the code in it's current state, and that is the serialization code, which is haphazard, incomplete, and inconsistent. I am not an expert on serialization, so I have left it unchanged. If you ignore serialization issues in eclipse, there are no longer any warnings at all. Running 'FindBugs' identifies only two issue types, Serialization, and some 'inefficient new Integer() calls)' So, given the current state of the code, what comes next? In the short term I intend to get some builds up on to the web-site so that people can play with the code. In addition, there will be some statistics related to code coverage, and unit tests. This will give people an easy way to track progress, and to play with the code. Then I intend to fix some of the more 'trivial' bugs that are still outstanding, like the TRIM_FULL_WHITE bugs, some Iterator problems, List problems, etc. At the same time I plan on updating the wiki documentation for a bunch of things. So, at the end of this week I expect to have a better idea of what the final result should look like. Conclusion ---------- If you want to have a look at the code, get a feel for what it looks like, you can get it very easily. I have tagged the current code state with the tag 'jdom2-epoch'. Thus, you can reference that in github, and get a zipped package of the code base for that tag. Clicking on this link should start a download for you: https://github.com/hunterhacker/jdom/zipball/jdom2-epoch If you have suggestions or ideas for what you think the results should look like now is the time to speak up. Rolf From jdom at tuis.net Sun Oct 2 21:15:39 2011 From: jdom at tuis.net (Rolf) Date: Mon, 03 Oct 2011 00:15:39 -0400 Subject: [jdom-interest] JDOM2 Update. In-Reply-To: <4E89119E.5010903@tuis.net> References: <4E76BDD7.7050505@tuis.net> <4E89119E.5010903@tuis.net> Message-ID: <4E8936EB.9030202@tuis.net> Hi All. And there it is ... short-term goal 1: http://hunterhacker.github.com/jdom/jdom2/index.html The page with the JDOM2 metrics. You can browse the jUnit and Cobertura reports, and see what's happening. The page is really 'spartan' right now and colours and style are 'lacking', but, it has the base details. Rolf On 02/10/2011 9:36 PM, Rolf wrote: > Hi All > > Another update. > > This has been a busy spell since the last update. > > JDOM 1.1.2 > ========== > > This is ready to go, and has been going through the final stages of > publishing it. You can see the change-log by either inspecting the > Changes file or the issues list: > > https://github.com/hunterhacker/jdom/blob/jdom-1.1.2/core/CHANGES.txt > https://github.com/hunterhacker/jdom/issues?labels=backport+1.1.2+done&sort=created&direction=desc&state=closed&page=1 > > > In summary, there have been 14 bug fixes, and the Jar will be available > on maven-central. > > The JDOM 1.x branch will remain open for bug-fixes only. > > If you want a sneak-peak of the 1.1.2 code you can download the source > at: https://github.com/hunterhacker/jdom/zipball/jdom-1.1.2 > > JDOM 2 > ====== > > There are two major news items here, and an anticipated plan: > 1. the jUnit testing has close to complete coverage, with only some > unreachable code, and some quirky exception cases being missed; > 2. Basic Generics changes have been made. > 3. Putting together a JDOM2 plan > > Unit Tests: > ----------- > > There are some failing tests, and some ignored tests. The failing tests > relate two one of two things: > - there is a bug in JDOM where multiple consecutive Text content > instances are not processed correctly... see > https://github.com/hunterhacker/jdom/issues/31 I have ignored some > tests, and left one failing until this gets resolved. > - The Jaxen code has a bug with respect to the ordering of Attribute and > Namespace nodes, see http://jira.codehaus.org/browse/JAXEN-215 It means > that for Attributes and Namespaces the XPath node set is not returned in > Document order, and the Test cases expect that, so currently they fail. > (I have a 'patched' version of Jaxen in my environment and the tests pass). > > Generics: > --------- > > I have done a first pass at Generics for the code. The intention for > this was was to do a cleanup of warnings and to get a baseline of a > 'simple' JDOM that's 'neat'. See the conclusion for how to get the code. > > I have made only one significant API change which is to substantially > extend the 'Filter' interface, and implementing classes. This has > allowed for a completely 'clean' JDOM2 code base. The Filter API change > is backward-compatible (unless you happen to have your own > implementation of the Filter interface), and as a result, you should be > able to do a drop-in replacement of the current JDOM2 code with your > existing code (except you will have to change all the org.jdom.* imports > to org.jdom2.* > > At this point the code is as close as possible to being a 'minimum' > JDOM2: it is JDOM with Generics, plus a minimum amount of spice on the > Filter API to make the getContent(Filter) stuff work. See the > 'Conclusion' for how to get the code. > > > Planning: > --------- > > At this point The code is 'ripe' for ideas. The regression test harness > is comprehensive, the code is close to 'clean', and yet it is all still > very familiar to anyone familiar with JDOM. > > There is one concern I have with the code in it's current state, and > that is the serialization code, which is haphazard, incomplete, and > inconsistent. I am not an expert on serialization, so I have left it > unchanged. > > If you ignore serialization issues in eclipse, there are no longer any > warnings at all. Running 'FindBugs' identifies only two issue types, > Serialization, and some 'inefficient new Integer() calls)' > > So, given the current state of the code, what comes next? > > In the short term I intend to get some builds up on to the web-site so > that people can play with the code. In addition, there will be some > statistics related to code coverage, and unit tests. This will give > people an easy way to track progress, and to play with the code. > > Then I intend to fix some of the more 'trivial' bugs that are still > outstanding, like the TRIM_FULL_WHITE bugs, some Iterator problems, List > problems, etc. > > At the same time I plan on updating the wiki documentation for a bunch > of things. > > So, at the end of this week I expect to have a better idea of what the > final result should look like. > > Conclusion > ---------- > > If you want to have a look at the code, get a feel for what it looks > like, you can get it very easily. I have tagged the current code state > with the tag 'jdom2-epoch'. Thus, you can reference that in github, and > get a zipped package of the code base for that tag. Clicking on this > link should start a download for you: > https://github.com/hunterhacker/jdom/zipball/jdom2-epoch > > If you have suggestions or ideas for what you think the results should > look like now is the time to speak up. > > Rolf > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com > From jdom at tuis.net Thu Oct 6 07:42:19 2011 From: jdom at tuis.net (Rolf Lear) Date: Thu, 06 Oct 2011 10:42:19 -0400 Subject: [jdom-interest] JDOM2 Update. In-Reply-To: <4E89119E.5010903@tuis.net> References: <4E76BDD7.7050505@tuis.net> <4E89119E.5010903@tuis.net> Message-ID: <4e064a94e3e02f037383a57403315aae@tuis.net> Hi All. I have decided to fix and backport issue #2 and issue #48 as well to 1.1.2. This is going to affect the release timeline of the 1.1.2 code. As a result, the current tag jdom-1.1.2 in github is going to be moved. Please don't rely on it. Issue #48 ( https://github.com/hunterhacker/jdom/issues/48 ) is only recently found. It has a simple fix that is easy to back-port. Issue #2 ( https://github.com/hunterhacker/jdom/issues/2 ) should always have been back-ported, but it somehow slipped through the cracks when I was looking for issues to backport. I think it's because it was on 'page 2' of the issue list, and it has a lot of comments that are somewhat complicated to get your head around. I like the fix proposed by Brad, but even though the suggested fix is actually simpler than the current code, it 'reverses' the way JDOM thinks about the QName and localName values for both Element and Attribute names. As a result it is a little more complicated to back-port and test. It is compounded by the issue #1 (defaulted/fixed attributes in a Namespace) fix which makes that area of code more complex. I am trying to put together a table of what to expect from a SAX parser when the three main configurations are used: not-namespace-aware, namespace-aware, and namespaces-with-prefixes JDOM does not support not-namespace-aware SAX parsers, but it should support the other two modes. The way I see it is that issue #2 is actually caused by JDOM messing up the assumptions on what data is provided in the two different supported SAX modes. Further, technically SAX Parsers only need to 'optionally' support the namespaces-with-prefixes mode, and JDOM assumes that all parsers do. The different modes of operation set different expectations on what values are passed in to the SAX 'startElement' event. In essence, JDOM sets the 'namespaces' feature, but expects the startElement() event to contain details only provided by the optional (and not set) 'namespace-prefixes' feature. The combination of Brad's patch plus the issue #1 fix *should* mean that JDOM fully supports both SAX parse features "namespaces" and "namespace-prefixes" : see http://download.oracle.com/javase/6/docs/api/org/xml/sax/package-summary.html although if a namespace-aware parser does not provide prefix details then JDOM will generate 'implementation-specific' prefixes. As a result I am also back-porting a number of the jUnit tests I have for JDOM2 to get some sense of reliability in the code. Expect this to delay 1.1.2 until at least next week. Rolf On Sun, 02 Oct 2011 21:36:30 -0400, Rolf wrote: > Hi All > > Another update. > > This has been a busy spell since the last update. > > JDOM 1.1.2 > ========== > > This is ready to go, and has been going through the final stages of > publishing it. You can see the change-log by either inspecting the > Changes file or the issues list: > > https://github.com/hunterhacker/jdom/blob/jdom-1.1.2/core/CHANGES.txt > https://github.com/hunterhacker/jdom/issues?labels=backport+1.1.2+done&sort=created&direction=desc&state=closed&page=1 > > > > In summary, there have been 14 bug fixes, and the Jar will be available > on maven-central. > > The JDOM 1.x branch will remain open for bug-fixes only. > > If you want a sneak-peak of the 1.1.2 code you can download the source > at: https://github.com/hunterhacker/jdom/zipball/jdom-1.1.2 > From jdom at tuis.net Thu Oct 13 17:27:17 2011 From: jdom at tuis.net (Rolf) Date: Thu, 13 Oct 2011 20:27:17 -0400 Subject: [jdom-interest] JDOM2 and Performance. Message-ID: <4E9781E5.1080609@tuis.net> Hi all. I have put together a 'simple' system for measuring the relative performance of JDOM2. The idea is that I need to know whether I am improving or breaking JDOM performance as the code evolves. Currently the metric code is only useful of you compare apples to apples, and, in this case, it means processing a single (medium size) XML document on my laptop, yada-yada-yada. But, it should be useful as a tool to get a feel for what a code-change does. Already I can see that I probably have an issue in the SAXHandler (possibly an issue in JDOM-1.1.2 actually) because 1.1.2 is 5-times faster in that area than JDOM2. I have put together a results page here: http://hunterhacker.github.com/jdom/jdom2/performance.html It also describes what each test does. If you are interested in seeing the code and what it does have a look here (it is not well documented and it is still perhaps evolving): https://github.com/hunterhacker/jdom/commit/8b719c86913398ace8e197b6de145b33d9d300bb Rolf From mj-lists at expertsystems.se Fri Oct 14 01:29:23 2011 From: mj-lists at expertsystems.se (Mattias Jiderhamn) Date: Fri, 14 Oct 2011 10:29:23 +0200 Subject: [jdom-interest] JDOM2 and Performance. Message-ID: <4E97F2E3.9010406@expertsystems.se> Tip of the day: http://code.google.com/p/caliper/ ----- Original Message ----- Subject: Re: [jdom-interest] JDOM2 and Performance. Date: Fri, 14 Oct 2011 10:08:36 +0200 From: Noel Grandin Hi Performance testing on the Java VM is tricky. To avoid getting caught out by cache-hot/cache-cold and JIT vs. not-JIT things, it's preferrable to do something like this in PerfTest#timeRun(Runnnable) // warm up the caches and get the JIT going for (int i=0; i<10; i++) { runnable.run(); } // give the JIT time to run, and get GC to run - GC can be stubborn sometimes for (int i=0; i<3; i++) { Thread.sleep(100); System.gc(); } // need 20 runs to get a decent average and standard deviation ArithmeticMean mean = new ArithmeticMean(); // these two classes are in jakarata-commons-math Variance deviation = new Variance(); for (int i=0; i<20; i++) { long time1 = System.currentTimeNanos(); runnable.run(); long time2 = System.currentTimeNanos(); mean.increment(time2 - time1); deviation.increment(time2 - time1); } System.out.println("result = " + mean.getMean() + " +- " + deviation.getVariance()); Regards, Noel Grandin Rolf wrote: > Hi all. > > I have put together a 'simple' system for measuring the relative performance of JDOM2. The idea is that I need to know > whether I am improving or breaking JDOM performance as the code evolves. > > Currently the metric code is only useful of you compare apples to apples, and, in this case, it means processing a > single (medium size) XML document on my laptop, yada-yada-yada. But, it should be useful as a tool to get a feel for > what a code-change does. > > Already I can see that I probably have an issue in the SAXHandler (possibly an issue in JDOM-1.1.2 actually) because > 1.1.2 is 5-times faster in that area than JDOM2. > > I have put together a results page here: > > http://hunterhacker.github.com/jdom/jdom2/performance.html > > It also describes what each test does. If you are interested in seeing the code and what it does have a look here (it > is not well documented and it is still perhaps evolving): > > https://github.com/hunterhacker/jdom/commit/8b719c86913398ace8e197b6de145b33d9d300bb > > > Rolf > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com > Disclaimer: http://www.peralex.com/disclaimer.html _______________________________________________ To control your jdom-interest membership: http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com -- From jdom at tuis.net Fri Oct 14 03:00:14 2011 From: jdom at tuis.net (Rolf Lear) Date: Fri, 14 Oct 2011 06:00:14 -0400 Subject: [jdom-interest] JDOM2 and Performance. In-Reply-To: <4E97F2E3.9010406@expertsystems.se> References: <4E97F2E3.9010406@expertsystems.se> Message-ID: <4E98082E.20305@tuis.net> Fascinating.... I could lose some hours to looking in to that, but after a quick view, it could earn that time back in no time.... I will check it out properly... though I will probably have to come up with a different way of 'packaging' the code if I import something else. I am reluctant to do that.... and it may be overkill for what I want. Thanks Rolf On 14/10/2011 4:29 AM, Mattias Jiderhamn wrote: > Tip of the day: http://code.google.com/p/caliper/ > > > From jdom at tuis.net Fri Oct 14 05:38:54 2011 From: jdom at tuis.net (Rolf Lear) Date: Fri, 14 Oct 2011 08:38:54 -0400 Subject: [jdom-interest] JDOM2 and Performance. In-Reply-To: <4E9806DA.5020605@tuis.net> References: <4E9781E5.1080609@tuis.net> <4E97EE04.3070307@peralex.com> <4E9806DA.5020605@tuis.net> Message-ID: <845a11cf4091bbff5c2aa45545e27811@tuis.net> just got off the train, and I've bumped up the inner iterations to 12, and it makes no difference w/r/t the timings, but, while looking in to things I have identified the changes to the XMLOutputter ( https://github.com/hunterhacker/jdom/commit/bf4fa33d253035edd085c5d190bd818133871742 ) as being the cause for the regression in the performance between 1.1.2 and 2.x. I am going to have to figure out how the performance of the 'hamlet' XMLOutputter,output(Document) goes from 3ms in JDOM1.1.2 to 24ms in 2.x (on my laptop). I think it may have to do with the Element.getNamespacesIntroduced() code. I will need another train-ride to fix that! I am certain that a tool that 'monitors' the performance of the core JDOM features is essential for the confidence required in any changes we make to JDOM itself, so I am committed to making sure such a tool is available. Right now what I have is 'just adequate', I think, but it has rough edges, and is narrowly scoped. The question is how much effort should we put in to the 'tool' rather than the core code? What's the trade-off? Is there a better way of doing it? Is anyone willing to put together a more thorough 'harness' for JDOM? I certainly would appreciate that! Something that is: 1. easy to run so that contributors to JDOM can test their work before and after their submissions 2. has a way to 'preserve' results in an easy fashion that makes it easy to update a web-page 3. is more 'extensible' than what I have done so far (so that 'plugging in' additional tests is easy....). I considered something 'on top' of jUnit, but the 'performance' benchmark is more important for the 'typical' usages of JDOM, whereas the jUnit tests are more targeted at the atypical execution paths.... We need *fast* core code, but exceptions and unusual code we don't really care about in respect to performance. Rolf On Fri, 14 Oct 2011 05:54:34 -0400, Rolf Lear wrote: > Hi Noel > > Thanks for that. > > It comes out in the numbers, but, for the record, I am doing something > very similar to that.... only the structures are slightly (very) different. > > I do a bunch of GC's, and I do one in a different thread with the > current thread sleeping, then I repeat the GC's until the size becomes > 'stable' at a change of less than 128 bytes. > https://github.com/hunterhacker/jdom/commit/8b719c86913398ace8e197b6de145b33d9d300bb#L1R33 > > I do a complete once-through of the test suite to warm things up. > Each once-through runs the code through 6 times (hmmm... I thought it > was 12, but that was something else I did yesterday). Each of the actual > tests 'exercises' the code repeatedly because it's all sort of > loop-based code (parsing, scanning, etc.). > > Anyway, the output of the 'warmup' run is always much slower than the > remaining 5 'real' runs, and I do the 'real' runs multiple times to > ensure there is some stability. > > What you see in the web-page is the result of what I believe to be fully > JIT-compiled and 'clean' and 'reliable enough' for the purposes I want. > > I know that the Java VM testing is 'tricky' when it comes to > performance, and as such I understand that it's easy to get things > wrong, and I'll spend more time looking at it to ensure I'm doing the > reasonable thing, but, are you suggesting that the code I am running is > not actually getting reliable results? > > The code is structured differently to what you have suggested below, > but, the entire 'main' loop is warmed up: > https://github.com/hunterhacker/jdom/commit/8b719c86913398ace8e197b6de145b33d9d300bb#L1R124 > > Then, the main loop is run 5 times, and I visually inspect the numbers > to ensure that they are consistent: > https://github.com/hunterhacker/jdom/commit/8b719c86913398ace8e197b6de145b33d9d300bb#L1R135 > > Between each 'test' I do a full GC with 'bells and whistles' > https://github.com/hunterhacker/jdom/commit/8b719c86913398ace8e197b6de145b33d9d300bb#L1R160 > > It is quite obvious that the runs that come out of the 'real' loops are > optimized, cached, etc. > > What is not clear is whether the optimizer has completely compiled out > some of the code. I have tried to ensure that it does not by doing some > sort of test on each element so that it is not completely ignored. > now that I think about it though, maybe the 'devnull' Writer is too > 'light' and the optimizer may have completely skipped it entirely..... > and the whole XMLOUtputter code with it.... I will check. > > So, I appreciate the insight, and I will play around with things to see > if increasing the number of warmup and actual 'real' runs changes the > numbers. > > I'll look in to making sure that some of the code is not being optimized > out completely. > > But, my code already is doing pretty much exactly what you are > suggesting... (it does not calculate the deviation, but it does ignore > the fastest and slowest run.....). > > In fact, it does more because it then repeats the exact same loops > multiple times to ensure the averages remain consistent over runs (as it > happens, it essentially does 20 'runs' of the code to get the results - > 5 loops of 6 runs but the 6 only counts as 4 because the best and worst > are eliminated). > > Have you got specific concerns about the code? Did you run it? Do you > think the results are 'wrong'? > > Thanks for the insight in to the commons-math code. I'm always > 'discovering' more and more 'stuff' in commons code. I have some 'stuff' > I've done at work I am trying to convince my boss (actually > legal&compliance) to let me use in JDOM, but it's the sort of thing that > belongs in a 'commons' type location, not JDOM.... > > Rolf > > On 14/10/2011 4:08 AM, Noel Grandin wrote: >> Hi >> >> Performance testing on the Java VM is tricky. >> To avoid getting caught out by cache-hot/cache-cold and JIT vs. >> not-JIT things, it's preferrable to do something like this in >> PerfTest#timeRun(Runnnable) >> >> // warm up the caches and get the JIT going >> for (int i=0; i<10; i++) { >> runnable.run(); >> } >> >> // give the JIT time to run, and get GC to run - GC can be stubborn >> sometimes >> for (int i=0; i<3; i++) { >> Thread.sleep(100); >> System.gc(); >> } >> >> // need 20 runs to get a decent average and standard deviation >> ArithmeticMean mean = new ArithmeticMean(); // these two classes are >> in jakarata-commons-math >> Variance deviation = new Variance(); >> for (int i=0; i<20; i++) { >> long time1 = System.currentTimeNanos(); >> runnable.run(); >> long time2 = System.currentTimeNanos(); >> mean.increment(time2 - time1); >> deviation.increment(time2 - time1); >> } >> >> System.out.println("result = " + mean.getMean() + " +- " + >> deviation.getVariance()); >> >> Regards, Noel Grandin >> >> Rolf wrote: >>> Hi all. >>> >>> I have put together a 'simple' system for measuring the relative >>> performance of JDOM2. The idea is that I need to know whether I am >>> improving or breaking JDOM performance as the code evolves. >>> >>> Currently the metric code is only useful of you compare apples to >>> apples, and, in this case, it means processing a single (medium size) >>> XML document on my laptop, yada-yada-yada. But, it should be useful >>> as a tool to get a feel for what a code-change does. >>> >>> Already I can see that I probably have an issue in the SAXHandler >>> (possibly an issue in JDOM-1.1.2 actually) because 1.1.2 is 5-times >>> faster in that area than JDOM2. >>> >>> I have put together a results page here: >>> >>> http://hunterhacker.github.com/jdom/jdom2/performance.html >>> >>> It also describes what each test does. If you are interested in >>> seeing the code and what it does have a look here (it is not well >>> documented and it is still perhaps evolving): >>> >>> https://github.com/hunterhacker/jdom/commit/8b719c86913398ace8e197b6de145b33d9d300bb >>> >>> >>> >>> >>> Rolf >>> _______________________________________________ >>> To control your jdom-interest membership: >>> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com >>> >> >> >> >> ------------------------------------------------------------------------ >> Disclaimer: http://www.peralex.com/disclaimer.html >> From jdom at tuis.net Sat Oct 15 15:17:33 2011 From: jdom at tuis.net (Rolf) Date: Sat, 15 Oct 2011 18:17:33 -0400 Subject: [jdom-interest] JDOM2 and Performance. In-Reply-To: <4E9781E5.1080609@tuis.net> References: <4E9781E5.1080609@tuis.net> Message-ID: <4E9A067D.6030004@tuis.net> Hi all. I've come close to restoring the JDOM 1.1.2 levels of performance. When 'fixing' code in JDOM2 I cam accross a numbr of different places where namespace processing is performed (calculating the 'in scope' and the 'added' namespaces for an Element). This code was scattered in various places, inconsistent, and some places were buggy. I ended up stripping all of these places and replacing them all with the Content.getNamespacesInScope() concepts. While convenient, the Content.getNamespacesInScope() methods were (much too) slow because they dynamically calculate the Namespaces each time they are called (which is fine for unstructured requirements where the document structure could change from one moment to the next). I have thus re-implemented a new 'Namespace Stack' which is much faster than a completely dynamic calculation, and it is able to replace the various other 'stacks' that were removed before. This has (mostly) 'restored' the performance of JDOM2's 'guts', I seem to be about 1-2% slower at the moment than JDOM 1.1.2 If you look at the numbers you will see that the 'Dump' code is still slow though. The Dump code dumps the document in the three main formats: Pretty, Raw, and Compact. This is running slow, and is probably related to the changes made for Issue #31. I'm going to fix up that performance in XMLOutputter, and hopefully that will pull back the performance numbers on the other areas (the 1% - 2%) because each of those processes use the XMLOutputter in some way. The Dump is particularly slow because it uses the more complicated Pretty and Compact mechanisms....). The 'performance' page below has been updated... Rolf On 13/10/2011 8:27 PM, Rolf wrote: > Hi all. > > I have put together a 'simple' system for measuring the relative > performance of JDOM2. The idea is that I need to know whether I am > improving or breaking JDOM performance as the code evolves. > > Currently the metric code is only useful of you compare apples to > apples, and, in this case, it means processing a single (medium size) > XML document on my laptop, yada-yada-yada. But, it should be useful as a > tool to get a feel for what a code-change does. > > Already I can see that I probably have an issue in the SAXHandler > (possibly an issue in JDOM-1.1.2 actually) because 1.1.2 is 5-times > faster in that area than JDOM2. > > I have put together a results page here: > > http://hunterhacker.github.com/jdom/jdom2/performance.html > > It also describes what each test does. If you are interested in seeing > the code and what it does have a look here (it is not well documented > and it is still perhaps evolving): > > https://github.com/hunterhacker/jdom/commit/8b719c86913398ace8e197b6de145b33d9d300bb > > > > Rolf From jdom at tuis.net Tue Oct 18 20:11:24 2011 From: jdom at tuis.net (Rolf) Date: Tue, 18 Oct 2011 23:11:24 -0400 Subject: [jdom-interest] JDOM2 and Performance. In-Reply-To: <4E9A067D.6030004@tuis.net> References: <4E9781E5.1080609@tuis.net> <4E9A067D.6030004@tuis.net> Message-ID: <4E9E3FDC.90306@tuis.net> Hi Again. Just committed a new snapshot of JDOM2 together with the JavaDocs, jUnit and coverage reports, and a performance update to: http://hunterhacker.github.com/jdom/jdom2/ The performance has been mostly restored, and there are big improvements in the XPath processing (even though I changed nothing in that area... ;- ) , it is all to do with more efficient Iterator implementations in the ContentList. See http://hunterhacker.github.com/jdom/jdom2/performance.html I have done the first major refactor of JDOM2 code, essentially rewriting the XMLOutputter code. It is much neater, consistent, and, should you need to, it is now completely 'extensible'. By changing the way the code is structured, the XMLOutputter is now reentrant, and yet still just as fast, if not faster for some things. I have found and fixed a lot of obscure bugs that may have been plaguing people even if they did not know it..., like if you have xml:space="preserve" embedded in your XML document, then JDOM would happily insist on outputting whatever content was inside that in the UFT-8 encoding, even if you had requested some other encoding... This particular refactor has taken a lot of time, so I have to back off a little and catch up on some other things in life... back to just 'JDOM on the train' for a bit. Rolf On 15/10/2011 6:17 PM, Rolf wrote: > Hi all. > > I've come close to restoring the JDOM 1.1.2 levels of performance. > > When 'fixing' code in JDOM2 I cam accross a numbr of different places > where namespace processing is performed (calculating the 'in scope' and > the 'added' namespaces for an Element). This code was scattered in > various places, inconsistent, and some places were buggy. I ended up > stripping all of these places and replacing them all with the > Content.getNamespacesInScope() concepts. > > While convenient, the Content.getNamespacesInScope() methods were (much > too) slow because they dynamically calculate the Namespaces each time > they are called (which is fine for unstructured requirements where the > document structure could change from one moment to the next). > > I have thus re-implemented a new 'Namespace Stack' which is much faster > than a completely dynamic calculation, and it is able to replace the > various other 'stacks' that were removed before. > > This has (mostly) 'restored' the performance of JDOM2's 'guts', I seem > to be about 1-2% slower at the moment than JDOM 1.1.2 > > If you look at the numbers you will see that the 'Dump' code is still > slow though. The Dump code dumps the document in the three main formats: > Pretty, Raw, and Compact. This is running slow, and is probably related > to the changes made for Issue #31. > > I'm going to fix up that performance in XMLOutputter, and hopefully that > will pull back the performance numbers on the other areas (the 1% - 2%) > because each of those processes use the XMLOutputter in some way. > The Dump is particularly slow because it uses the more complicated > Pretty and Compact mechanisms....). > > The 'performance' page below has been updated... > > Rolf > > On 13/10/2011 8:27 PM, Rolf wrote: >> Hi all. >> >> I have put together a 'simple' system for measuring the relative >> performance of JDOM2. The idea is that I need to know whether I am >> improving or breaking JDOM performance as the code evolves. >> >> Currently the metric code is only useful of you compare apples to >> apples, and, in this case, it means processing a single (medium size) >> XML document on my laptop, yada-yada-yada. But, it should be useful as a >> tool to get a feel for what a code-change does. >> >> Already I can see that I probably have an issue in the SAXHandler >> (possibly an issue in JDOM-1.1.2 actually) because 1.1.2 is 5-times >> faster in that area than JDOM2. >> >> I have put together a results page here: >> >> http://hunterhacker.github.com/jdom/jdom2/performance.html >> >> It also describes what each test does. If you are interested in seeing >> the code and what it does have a look here (it is not well documented >> and it is still perhaps evolving): >> >> https://github.com/hunterhacker/jdom/commit/8b719c86913398ace8e197b6de145b33d9d300bb >> >> >> >> >> Rolf > > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com > From jhunter at servlets.com Sun Oct 23 17:47:57 2011 From: jhunter at servlets.com (Jason Hunter) Date: Sun, 23 Oct 2011 17:47:57 -0700 Subject: [jdom-interest] [Announce] JDOM 1.1.2 is released Message-ID: <683DBC34-1A42-4525-BB64-CF368CE44FC1@servlets.com> I'm happy to announce the release of JDOM 1.1.2 today. It's a drop-in replacement for JDOM 1.1.1 with more than a dozen bugs fixed. You can download the release here: http://jdom.org/dist/binary/ You can see the changes here: https://github.com/hunterhacker/jdom/blob/jdom-1.1.2/core/CHANGES.txt It'll appear in maven-central shortly too. Thanks to Rolf Lear for doing all the heavy lifting for this release! -jh- From jdom at tuis.net Mon Oct 24 02:35:45 2011 From: jdom at tuis.net (Rolf Lear) Date: Mon, 24 Oct 2011 05:35:45 -0400 Subject: [jdom-interest] [Announce] JDOM 1.1.2 is released In-Reply-To: <683DBC34-1A42-4525-BB64-CF368CE44FC1@servlets.com> References: <683DBC34-1A42-4525-BB64-CF368CE44FC1@servlets.com> Message-ID: <4EA53171.7030705@tuis.net> And there it is in maven-central: http://search.maven.org/#artifactdetails%7Corg.jdom%7Cjdom%7C1.1.2%7Cjar Rolf On 23/10/2011 8:47 PM, Jason Hunter wrote: > I'm happy to announce the release of JDOM 1.1.2 today. It's a drop-in replacement for JDOM 1.1.1 with more than a dozen bugs fixed. You can download the release here: > > http://jdom.org/dist/binary/ > > You can see the changes here: > > https://github.com/hunterhacker/jdom/blob/jdom-1.1.2/core/CHANGES.txt > > It'll appear in maven-central shortly too. > > Thanks to Rolf Lear for doing all the heavy lifting for this release! > > -jh- > > > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com > From mike at saxonica.com Mon Oct 24 05:29:22 2011 From: mike at saxonica.com (Michael Kay) Date: Mon, 24 Oct 2011 13:29:22 +0100 Subject: [jdom-interest] Performance: JDOM2 and Saxon In-Reply-To: <4EA5540E.3080201@saxonica.com> References: <4EA5540E.3080201@saxonica.com> Message-ID: <4EA55A22.2050702@saxonica.com> My colleague O'Neil Delpratt has been doing some performance experiments with JDOM1 and JDOM2. Here are the results he is getting. Experiment: I ran a somewhat simplified test harness on the same two XPath expression (i.e. "//@null" and "//node()") on the XML document hamlet.xml Results Average time taken over 50 runs, excluding the first run. JDOM1: 273.15ms JDOM2: 92.56ms Saxon (TinyTree treeModel): 2.8ms Saxon (JDOM treeModel): 10.36ms Saxon (JDOM2 treeModel): 10.82ms The # of tree nodes: Saxon: 12097 Standalone JDOM(-2): 19840 The difference in results was down to whitespace between elements represented as text nodes in JDOM(-2). So: JDOM2 is doing a good job relative to JDOM1, but the XPath engine is still very slow compared to Saxon's XPath engine. The Saxon code for accessing JDOM2 uses the JDOM node.getDescendants() method rather than making recursive use of getChildren() as we do with JDOM1, and this benefits performance in that without this change, the JDOM2 code ran in 12.28ms; but we're still getting slightly slower results from JDOM2 despite this improvement. I believe the way the measurements were done causes the XPath expression to be compiled once and executed repeatedly. The differences we are seeing from these results are: (a) The TinyTree is very fast when processing the descendant axis (because the nodes are held in an array in document order) (b) In the scenario where XPath compile time is amortized over many executions (the only case we've measured), the Saxon XPath engine is much faster than the one built in to JDOM. (c) JDOM2 is fractionally slower than JDOM1 in its navigational APIs, even though its XPath engine is now three times faster. Michael Kay Saxonica From jdom at tuis.net Mon Oct 24 07:15:18 2011 From: jdom at tuis.net (Rolf Lear) Date: Mon, 24 Oct 2011 10:15:18 -0400 Subject: [jdom-interest] Performance: JDOM2 and Saxon In-Reply-To: <4EA55A22.2050702@saxonica.com> References: <4EA5540E.3080201@saxonica.com> <4EA55A22.2050702@saxonica.com> Message-ID: <3ec6e6c19f8da026efff1b706db90422@tuis.net> Hi Michael, O'Neil I simply have not looked in to Saxon yet, so I have no frame of reference, and bear with me on that as it will happen at some point... There is issue #34 https://github.com/hunterhacker/jdom/issues/34 to track XSLTransform which I created in response to your suggestions for Saxon... and I do keep looking at it. My overall plan has 'always' been to: 1. build a regression test system (junit testcases). 2. build a performance regression test system (PerfTest) 3. make changes for JDOM2 with confidence. Having built the 'PerfTest' process I've nailed down some of the performance regressions I introduced, and followed the 'thread' of changes in to some other areas. It's a little 'aimless', but the current 'theme' is 'performance'. This is probably a mistake, I should be looking at 'structure' now that I have the (restored) performance baseline... but the 'performance' thing is always good, and I find it fun and challenging. The code is now 'ripe' for looking at structural changes though. Still, Saxon concerns me from a JDOM perspective because of the dual-licensing with the 'restricted' free/open version, and the 'complete' commercial version. My personal feel for this sort of situation is that the solution from a JDOM perspective is to keep the JDOM API open, and to make it possible/easy to use Saxon, but not to include either version of Saxon as the 'default engine'. Specifically, I don't see JDOM as being an advertising platform for some commercial product. I know this sort of issue is debatable/religious/etc. which is why it's important to understand that I am willing to defer to Jason's judgment on this one. For what it's worth the company I work for would would have to implement special protocol handling for JDOM if it were to bundle the Saxon code. On the other hand, I really do appreciate your taking the time to look in to the integration of Saxon and JDOM. I have some comments/questions/suggestions: 1. I changed the 'implementation' API of the XPath code when I worked on the jaxen bugs/issues. The intention was to make it easier (than before) to have other engines (like Saxon). Did this change help you with your tests? Could it be done better? 2. Is the integration 'glue' something that can be easily put in org.jdom2.xpath.saxon ? 3. I implemented new iterator() back-ends for ContentList which are significantly faster than before in change 41217056 (17th Oct). Is your test based on JDOM2 from before that? : https://github.com/hunterhacker/jdom/commit/412170566ebdf8449b442e44f12ed8712d447a19 Those changes should bring the hamlet.getDescendants() down to about 3ms 4. The 'missing' Text nodes are significant.... I am surprised that they are absent? What is the logic for skipping them? 5. Which leads to the question: How does the Saxon implementation fare on the unit tests? Can you create a Saxon version of: https://github.com/hunterhacker/jdom/blob/master/test/src/java/org/jdom2/test/cases/xpath/TestLocalJaxenXPath.java The 'snapshot' system I have started on the github pages is not very useful for figuring out what's in the snapshot, and naming the snapshot. I should fix that. But, the 'current' snapshots should have the improved iterator: http://hunterhacker.github.com/jdom/jdom2/snapshot/jdom-2.x-SNAPSHOT.jar It would be better though if you just pulled the latest code though because there are a couple of other changes that would improve performance too. Thanks again Rolf On Mon, 24 Oct 2011 13:29:22 +0100, Michael Kay wrote: > My colleague O'Neil Delpratt has been doing some performance experiments > with JDOM1 and JDOM2. Here are the results he is getting. > > > > Experiment: I ran a somewhat simplified test harness on the same two > XPath expression (i.e. "//@null" and "//node()") on the XML document > hamlet.xml > > Results > Average time taken over 50 runs, excluding the first run. > > JDOM1: 273.15ms > JDOM2: 92.56ms > Saxon (TinyTree treeModel): 2.8ms > Saxon (JDOM treeModel): 10.36ms > Saxon (JDOM2 treeModel): 10.82ms > > The # of tree nodes: > Saxon: 12097 > Standalone JDOM(-2): 19840 > > The difference in results was down to whitespace between elements > represented as text nodes in JDOM(-2). > > So: JDOM2 is doing a good job relative to JDOM1, but the XPath engine is > still very slow compared to Saxon's XPath engine. > > The Saxon code for accessing JDOM2 uses the JDOM node.getDescendants() > method rather than making recursive use of getChildren() as we do with > JDOM1, and this benefits performance in that without this change, the > JDOM2 code ran in 12.28ms; but we're still getting slightly slower > results from JDOM2 despite this improvement. > > I believe the way the measurements were done causes the XPath expression > to be compiled once and executed repeatedly. > > The differences we are seeing from these results are: > > (a) The TinyTree is very fast when processing the descendant axis > (because the nodes are held in an array in document order) > > (b) In the scenario where XPath compile time is amortized over many > executions (the only case we've measured), the Saxon XPath engine is > much faster than the one built in to JDOM. > > (c) JDOM2 is fractionally slower than JDOM1 in its navigational APIs, > even though its XPath engine is now three times faster. > > Michael Kay > Saxonica > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com From oneil at saxonica.com Tue Oct 25 03:46:39 2011 From: oneil at saxonica.com (O'Neil Delpratt) Date: Tue, 25 Oct 2011 11:46:39 +0100 Subject: [jdom-interest] Performance: JDOM2 and Saxon In-Reply-To: <3ec6e6c19f8da026efff1b706db90422@tuis.net> References: <4EA5540E.3080201@saxonica.com> <4EA55A22.2050702@saxonica.com> <3ec6e6c19f8da026efff1b706db90422@tuis.net> Message-ID: <4EA6938F.6050205@saxonica.com> Hi Rolf, The intention of doing these experiments was not to suggest that we can integrate Saxon with JDOM as a package, we recognize that this would create questions around the licensing. We were primarily interested in the performance of JDOM2 compared to JDOM1 using both the Saxon XPath engine and JDOM's embedded XPath engine. We wanted to check that we can do JDOM2 as well as JDOM1, and that the performance we get is acceptable. We thought we'd let you know the results as they seem to be interesting in the context of the JDOM2 project. Answer to your questions/comments: 1) I don't think we're interfacing with JDOM at that level - we don't attempt to make Saxon available using JDOM's APIs, only using Saxon's APIs. 2) Potentially - but I think there could be some difficulties because of the need to establish a Saxon Configuration. Using Saxon for individual XPath requests without giving Saxon any context that's reused across requests would probably perform badly. 3) I confirm the tests on JDOM2 were done using the build after the 17th October, there including the changes made to the Iterator() for ContentList. The tests confirm the results you had published on http://hunterhacker.github.com/jdom/jdom2/performance.html 4) Whitespace: not sure of the exact details here, but the general rule for XPath 1.0 is that all whitespace is preserved unless otherwise specified, whereas in XPath 2.0 it's DTD-sensitive - whitespace in element-only content gets removed. We could do a performance comparison that eliminated this potential source of differences, but I'm not sure we would learn much more from it. 5) I'm not sure this would be productive. Our focus is on running the W3C XSLT and XQuery test suites and making sure that the results when JDOM is used underneath match the expected results. (We've generally only done this for a subset of the tests, and there tend to be some differences in test results for different tree models, caused for example because some models don't label nodes as IDs or IDREFs, some don't expose unparsed entities, etc.) regards, Mike and O'Neil On 24/10/11 15:15, Rolf Lear wrote: > Hi Michael, O'Neil > > I simply have not looked in to Saxon yet, so I have no frame of reference, > and bear with me on that as it will happen at some point... > > There is issue #34https://github.com/hunterhacker/jdom/issues/34 to track > XSLTransform which I created in response to your suggestions for Saxon... > and I do keep looking at it. > > My overall plan has 'always' been to: > 1. build a regression test system (junit testcases). > 2. build a performance regression test system (PerfTest) > 3. make changes for JDOM2 with confidence. > > Having built the 'PerfTest' process I've nailed down some of the > performance regressions I introduced, and followed the 'thread' of changes > in to some other areas. It's a little 'aimless', but the current 'theme' is > 'performance'. > > This is probably a mistake, I should be looking at 'structure' now that I > have the (restored) performance baseline... but the 'performance' thing is > always good, and I find it fun and challenging. > > The code is now 'ripe' for looking at structural changes though. > > Still, Saxon concerns me from a JDOM perspective because of the > dual-licensing with the 'restricted' free/open version, and the 'complete' > commercial version. > > My personal feel for this sort of situation is that the solution from a > JDOM perspective is to keep the JDOM API open, and to make it possible/easy > to use Saxon, but not to include either version of Saxon as the 'default > engine'. Specifically, I don't see JDOM as being an advertising platform > for some commercial product. I know this sort of issue is > debatable/religious/etc. which is why it's important to understand that I > am willing to defer to Jason's judgment on this one. For what it's worth > the company I work for would would have to implement special protocol > handling for JDOM if it were to bundle the Saxon code. > > On the other hand, I really do appreciate your taking the time to look in > to the integration of Saxon and JDOM. > > I have some comments/questions/suggestions: > 1. I changed the 'implementation' API of the XPath code when I worked on > the jaxen bugs/issues. The intention was to make it easier (than before) to > have other engines (like Saxon). Did this change help you with your tests? > Could it be done better? > 2. Is the integration 'glue' something that can be easily put in > org.jdom2.xpath.saxon ? > 3. I implemented new iterator() back-ends for ContentList which are > significantly faster than before in change 41217056 (17th Oct). Is your > test based on JDOM2 from before that? : > https://github.com/hunterhacker/jdom/commit/412170566ebdf8449b442e44f12ed8712d447a19 > Those changes should bring the hamlet.getDescendants() down to about 3ms > 4. The 'missing' Text nodes are significant.... I am surprised that they > are absent? What is the logic for skipping them? > 5. Which leads to the question: How does the Saxon implementation fare on > the unit tests? Can you create a Saxon version of: > https://github.com/hunterhacker/jdom/blob/master/test/src/java/org/jdom2/test/cases/xpath/TestLocalJaxenXPath.java > > The 'snapshot' system I have started on the github pages is not very > useful for figuring out what's in the snapshot, and naming the snapshot. I > should fix that. > > But, the 'current' snapshots should have the improved iterator: > http://hunterhacker.github.com/jdom/jdom2/snapshot/jdom-2.x-SNAPSHOT.jar > > It would be better though if you just pulled the latest code though > because there are a couple of other changes that would improve performance > too. > > Thanks again > > Rolf > > On Mon, 24 Oct 2011 13:29:22 +0100, Michael Kay wrote: >> My colleague O'Neil Delpratt has been doing some performance experiments >> with JDOM1 and JDOM2. Here are the results he is getting. >> >> >> >> Experiment: I ran a somewhat simplified test harness on the same two >> XPath expression (i.e. "//@null" and "//node()") on the XML document >> hamlet.xml >> >> Results >> Average time taken over 50 runs, excluding the first run. >> >> JDOM1: 273.15ms >> JDOM2: 92.56ms >> Saxon (TinyTree treeModel): 2.8ms >> Saxon (JDOM treeModel): 10.36ms >> Saxon (JDOM2 treeModel): 10.82ms >> >> The # of tree nodes: >> Saxon: 12097 >> Standalone JDOM(-2): 19840 >> >> The difference in results was down to whitespace between elements >> represented as text nodes in JDOM(-2). >> >> So: JDOM2 is doing a good job relative to JDOM1, but the XPath engine is >> still very slow compared to Saxon's XPath engine. >> >> The Saxon code for accessing JDOM2 uses the JDOM node.getDescendants() >> method rather than making recursive use of getChildren() as we do with >> JDOM1, and this benefits performance in that without this change, the >> JDOM2 code ran in 12.28ms; but we're still getting slightly slower >> results from JDOM2 despite this improvement. >> >> I believe the way the measurements were done causes the XPath expression >> to be compiled once and executed repeatedly. >> >> The differences we are seeing from these results are: >> >> (a) The TinyTree is very fast when processing the descendant axis >> (because the nodes are held in an array in document order) >> >> (b) In the scenario where XPath compile time is amortized over many >> executions (the only case we've measured), the Saxon XPath engine is >> much faster than the one built in to JDOM. >> >> (c) JDOM2 is fractionally slower than JDOM1 in its navigational APIs, >> even though its XPath engine is now three times faster. >> >> Michael Kay >> Saxonica >> _______________________________________________ >> To control your jdom-interest membership: >> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com -- O'Neil Delpratt Software Developer, Saxonica Limited Email: oneil at saxonica.com Tel: +44 118 946 5894 Web: http://www.saxonica.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jdom at tuis.net Tue Oct 25 05:26:32 2011 From: jdom at tuis.net (Rolf Lear) Date: Tue, 25 Oct 2011 08:26:32 -0400 Subject: [jdom-interest] Performance: JDOM2 and Saxon In-Reply-To: <4EA6938F.6050205@saxonica.com> References: <4EA5540E.3080201@saxonica.com> <4EA55A22.2050702@saxonica.com> <3ec6e6c19f8da026efff1b706db90422@tuis.net> <4EA6938F.6050205@saxonica.com> Message-ID: <065757700bb856fa9c6421011fb8128e@tuis.net> Excellent. I can work with that, and the feedback is appreciated. If nothing else, seeing the numbers creates something of a 'baseline' against which we can set expectations. Based on your response, and since you are the first 'users' of JDOM2 speaking up (thanks) perhaps some follow-up comments: 1. the XPath area (org.jdom2.xpath.*) is expected to be revised still (issues #42 and #45). This may impact your work. 2. Have you identified any areas of JDOM2 code which are underperforming? You mention the "navigational API's" can you narrow that down to any particular iterators/methods? 3. In general is JDOM2 'better' to work with than JDOM1? Is it going in the right direction? Do you even notice it? 4. Are there any other changes that would make your life easier (API/etc.)? Thanks Rolf On Tue, 25 Oct 2011 11:46:39 +0100, O'Neil Delpratt wrote: > Hi Rolf, > > The intention of doing these experiments was not to suggest that we can > integrate Saxon with JDOM as a package, we recognize that this would > create questions around the licensing. We were primarily interested in > the performance of JDOM2 compared to JDOM1 using both the Saxon XPath > engine and JDOM's embedded XPath engine. We wanted to check that we can > do JDOM2 as well as JDOM1, and that the performance we get is > acceptable. We thought we'd let you know the results as they seem to be > interesting in the context of the JDOM2 project. > > Answer to your questions/comments: > 1) I don't think we're interfacing with JDOM at that level - we don't > attempt to make Saxon available using JDOM's APIs, only using Saxon's APIs. > > 2) Potentially - but I think there could be some difficulties because of > the need to establish a Saxon Configuration. Using Saxon for individual > XPath requests without giving Saxon any context that's reused across > requests would probably perform badly. > > 3) I confirm the tests on JDOM2 were done using the build after the 17th > October, there including the changes made to the Iterator() for > ContentList. The tests confirm the results you had published on > http://hunterhacker.github.com/jdom/jdom2/performance.html > > 4) Whitespace: not sure of the exact details here, but the general rule > for XPath 1.0 is that all whitespace is preserved unless otherwise > specified, whereas in XPath 2.0 it's DTD-sensitive - whitespace in > element-only content gets removed. We could do a performance comparison > that eliminated this potential source of differences, but I'm not sure > we would learn much more from it. > > 5) I'm not sure this would be productive. Our focus is on running the > W3C XSLT and XQuery test suites and making sure that the results when > JDOM is used underneath match the expected results. (We've generally > only done this for a subset of the tests, and there tend to be some > differences in test results for different tree models, caused for > example because some models don't label nodes as IDs or IDREFs, some > don't expose unparsed entities, etc.) > > regards, > > Mike and O'Neil > > > On 24/10/11 15:15, Rolf Lear wrote: >> Hi Michael, O'Neil >> >> I simply have not looked in to Saxon yet, so I have no frame of >> reference, >> and bear with me on that as it will happen at some point... >> >> There is issue #34https://github.com/hunterhacker/jdom/issues/34 to >> track >> XSLTransform which I created in response to your suggestions for Saxon... >> and I do keep looking at it. >> >> My overall plan has 'always' been to: >> 1. build a regression test system (junit testcases). >> 2. build a performance regression test system (PerfTest) >> 3. make changes for JDOM2 with confidence. >> >> Having built the 'PerfTest' process I've nailed down some of the >> performance regressions I introduced, and followed the 'thread' of >> changes >> in to some other areas. It's a little 'aimless', but the current 'theme' >> is >> 'performance'. >> >> This is probably a mistake, I should be looking at 'structure' now that I >> have the (restored) performance baseline... but the 'performance' thing >> is >> always good, and I find it fun and challenging. >> >> The code is now 'ripe' for looking at structural changes though. >> >> Still, Saxon concerns me from a JDOM perspective because of the >> dual-licensing with the 'restricted' free/open version, and the >> 'complete' >> commercial version. >> >> My personal feel for this sort of situation is that the solution from a >> JDOM perspective is to keep the JDOM API open, and to make it >> possible/easy >> to use Saxon, but not to include either version of Saxon as the 'default >> engine'. Specifically, I don't see JDOM as being an advertising platform >> for some commercial product. I know this sort of issue is >> debatable/religious/etc. which is why it's important to understand that I >> am willing to defer to Jason's judgment on this one. For what it's worth >> the company I work for would would have to implement special protocol >> handling for JDOM if it were to bundle the Saxon code. >> >> On the other hand, I really do appreciate your taking the time to look in >> to the integration of Saxon and JDOM. >> >> I have some comments/questions/suggestions: >> 1. I changed the 'implementation' API of the XPath code when I worked on >> the jaxen bugs/issues. The intention was to make it easier (than before) >> to >> have other engines (like Saxon). Did this change help you with your >> tests? >> Could it be done better? >> 2. Is the integration 'glue' something that can be easily put in >> org.jdom2.xpath.saxon ? >> 3. I implemented new iterator() back-ends for ContentList which are >> significantly faster than before in change 41217056 (17th Oct). Is your >> test based on JDOM2 from before that? : >> https://github.com/hunterhacker/jdom/commit/412170566ebdf8449b442e44f12ed8712d447a19 >> Those changes should bring the hamlet.getDescendants() down to about 3ms >> 4. The 'missing' Text nodes are significant.... I am surprised that they >> are absent? What is the logic for skipping them? >> 5. Which leads to the question: How does the Saxon implementation fare on >> the unit tests? Can you create a Saxon version of: >> https://github.com/hunterhacker/jdom/blob/master/test/src/java/org/jdom2/test/cases/xpath/TestLocalJaxenXPath.java >> >> The 'snapshot' system I have started on the github pages is not very >> useful for figuring out what's in the snapshot, and naming the snapshot. >> I >> should fix that. >> >> But, the 'current' snapshots should have the improved iterator: >> http://hunterhacker.github.com/jdom/jdom2/snapshot/jdom-2.x-SNAPSHOT.jar >> >> It would be better though if you just pulled the latest code though >> because there are a couple of other changes that would improve >> performance >> too. >> >> Thanks again >> >> Rolf >> >> On Mon, 24 Oct 2011 13:29:22 +0100, Michael Kay >> wrote: >>> My colleague O'Neil Delpratt has been doing some performance experiments >>> with JDOM1 and JDOM2. Here are the results he is getting. >>> >>> >>> >>> Experiment: I ran a somewhat simplified test harness on the same two >>> XPath expression (i.e. "//@null" and "//node()") on the XML document >>> hamlet.xml >>> >>> Results >>> Average time taken over 50 runs, excluding the first run. >>> >>> JDOM1: 273.15ms >>> JDOM2: 92.56ms >>> Saxon (TinyTree treeModel): 2.8ms >>> Saxon (JDOM treeModel): 10.36ms >>> Saxon (JDOM2 treeModel): 10.82ms >>> >>> The # of tree nodes: >>> Saxon: 12097 >>> Standalone JDOM(-2): 19840 >>> >>> The difference in results was down to whitespace between elements >>> represented as text nodes in JDOM(-2). >>> >>> So: JDOM2 is doing a good job relative to JDOM1, but the XPath engine is >>> still very slow compared to Saxon's XPath engine. >>> >>> The Saxon code for accessing JDOM2 uses the JDOM node.getDescendants() >>> method rather than making recursive use of getChildren() as we do with >>> JDOM1, and this benefits performance in that without this change, the >>> JDOM2 code ran in 12.28ms; but we're still getting slightly slower >>> results from JDOM2 despite this improvement. >>> >>> I believe the way the measurements were done causes the XPath expression >>> to be compiled once and executed repeatedly. >>> >>> The differences we are seeing from these results are: >>> >>> (a) The TinyTree is very fast when processing the descendant axis >>> (because the nodes are held in an array in document order) >>> >>> (b) In the scenario where XPath compile time is amortized over many >>> executions (the only case we've measured), the Saxon XPath engine is >>> much faster than the one built in to JDOM. >>> >>> (c) JDOM2 is fractionally slower than JDOM1 in its navigational APIs, >>> even though its XPath engine is now three times faster. >>> >>> Michael Kay >>> Saxonica >>> _______________________________________________ >>> To control your jdom-interest membership: >>> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com From mike at saxonica.com Tue Oct 25 06:19:53 2011 From: mike at saxonica.com (Michael Kay) Date: Tue, 25 Oct 2011 14:19:53 +0100 Subject: [jdom-interest] Performance: JDOM2 and Saxon In-Reply-To: <065757700bb856fa9c6421011fb8128e@tuis.net> References: <4EA5540E.3080201@saxonica.com> <4EA55A22.2050702@saxonica.com> <3ec6e6c19f8da026efff1b706db90422@tuis.net> <4EA6938F.6050205@saxonica.com> <065757700bb856fa9c6421011fb8128e@tuis.net> Message-ID: <4EA6B779.4010807@saxonica.com> On 25/10/2011 13:26, Rolf Lear wrote: > Excellent. I can work with that, and the feedback is appreciated. > > If nothing else, seeing the numbers creates something of a 'baseline' > against which we can set expectations. > > Based on your response, and since you are the first 'users' of JDOM2 > speaking up (thanks) perhaps some follow-up comments: > 1. the XPath area (org.jdom2.xpath.*) is expected to be revised still > (issues #42 and #45). This may impact your work. Well, apart from the comparative testing, we don't actually use that part > 2. Have you identified any areas of JDOM2 code which are underperforming? > You mention the "navigational API's" can you narrow that down to any > particular iterators/methods? I think that would need more careful study than we've carried out so far. But from what I've gleaned lurking on the list in the last couple of months, it wouldn't surprise me at all if namespaces are the culprit. They usually are. > 3. In general is JDOM2 'better' to work with than JDOM1? Is it going in > the right direction? Do you even notice it? I don't think we'd have noticed it at all if we hadn't been deliberately exploring the way the descendant axis navigation is now done. To be honest, my main motivation was to see if there were any ideas here worth stealing. > 4. Are there any other changes that would make your life easier > (API/etc.)? > > Well, apart from a total redesign to make it more strongly typed... The most tedious part is probably merging adjacent text nodes. I imagine that's a usability hazard for ordinary users too. Also, I suspect that sorting nodes into document order is probably more expensive than it needs to be. Michael Kay Saxonica From jdom at tuis.net Sun Oct 30 18:56:44 2011 From: jdom at tuis.net (Rolf) Date: Sun, 30 Oct 2011 21:56:44 -0400 Subject: [jdom-interest] Opinion Poll: - JDOM2 and minimum-required Java - Java5 or Java6 Message-ID: <4EAE005C.6020608@tuis.net> Hi all. When I started with JDOM2 I discussed with Jason whether we should target a minimum supported version of Java5 or Java6. At the time I put together a list of Java6 features I thought could be remotely useful: - JAXB 2.0 including StAX interfaces (we could have a StAXOutputter, etc.) - Deque - ContentList/AttributeList could/should be a Deque too - making ContentList.removeLast() possible, etc. Those were just off-the-cuff examples of what could be useful. I don't think that the ContentList/AttributeLists should be Deques... not now, anyway. Based on that we decided Java5 was still a reasonable target, unless something came up. My biggest concern is that if we introduce JDOM2 as supporting Java5 then we will have to support Java5 for a long time, introducing a testing dependency, and potentially curtailing future options... and Java5 itself has been unsupported for years.... Also, as it happens, I have inadvertently (out of habit) used an ArrayDeque in the DescendantIterator code as a Stack. It could be easily replaced with some other collections structure. What has brought this issue to a head though is that I have been working on some StAX code, and this is fairly well entrenched in Java6. Support for it on Java5 is not nice to add (need to download special jars, etc.). It can be done, but is a mess. Further, I have been looking at outputting to an XMLStreamWriter, and the support for that would be useful to add to the XMLOutputter class... and requiring an additional hard-to-get jar for that would be a real drawback.... we may as well just declare it to require Java6. So, as a poll: * Does anyone have a realistic need to run a future JDOM2 on Java5? * If so, could you add additional jars to your classpath just to make JDOM2 work? * Any comments, suggestions. Currently I feel that it is reasonable to set Java6 as a minumum and not even bother trying to think about Java5 issues... anyone disagree? Rolf From ian.lea at gmail.com Mon Oct 31 02:36:12 2011 From: ian.lea at gmail.com (Ian Lea) Date: Mon, 31 Oct 2011 09:36:12 +0000 Subject: [jdom-interest] Opinion Poll: - JDOM2 and minimum-required Java - Java5 or Java6 In-Reply-To: <07476A68-6C3E-432D-AE6C-B557A7AB6561@hoplahup.net> References: <4EAE005C.6020608@tuis.net> <07476A68-6C3E-432D-AE6C-B557A7AB6561@hoplahup.net> Message-ID: +1 for java6 -- Ian. On Mon, Oct 31, 2011 at 6:53 AM, Paul Libbrecht wrote: > > Le 31 oct. 2011 ? 02:56, Rolf a ?crit : > > * Does anyone have a realistic need to run a future JDOM2 on Java5? > > The only one is lack of time of our sysadmin. > I think it can be ignored. > > * If so, could you add additional jars to your classpath just to make JDOM2 > work? > > Sure, in any case! > > * Any comments, suggestions. > > I'm all for jdk 6. > We also only build for java 6. > paul > > Currently I feel that it is reasonable to set Java6 as a minumum and not > even bother trying to think about Java5 issues... anyone disagree? > > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com > From mike at saxonica.com Mon Oct 31 03:31:30 2011 From: mike at saxonica.com (Michael Kay) Date: Mon, 31 Oct 2011 10:31:30 +0000 Subject: [jdom-interest] Opinion Poll: - JDOM2 and minimum-required Java - Java5 or Java6 In-Reply-To: References: <4EAE005C.6020608@tuis.net> <07476A68-6C3E-432D-AE6C-B557A7AB6561@hoplahup.net> Message-ID: <4EAE7902.5010100@saxonica.com> * Does anyone have a realistic need to run a future JDOM2 on Java5? I think in the context of a project where people can stick with JDOM1 if they wish, having a dependency on Java 6 seems reasonable at first blush. But... In Saxon we want to support JDOM2. But we also want Saxon to work on Java 5. We don't mind having a restriction that you can't use JDOM2 with Saxon if you're on Java 5. But we've certainly got some extra complexities making this work, e.g. we might need to isolate the code that references JDOM2 into a separate package that is compiled under Java 6 and is not statically referenced from the main Saxon JAR file. So it would definitely be simpler for us and for our users if it all works under Java 5. Similar complexities are likely to apply to many other components that want to integrate JDOM2. From past experience the more complex the application, the harder it is for people to move forward. Typical scenario: the user has a license for Oracle N; upgrading it to Oracle N+1 will cost millions, but Oracle N only runs under Java version J. There's no business justification for spending the millions, so they're stuck with Java J, and everything else they use in the same application then also has to run under Java J. Michael Kay Saxonica From mikeb at mitre.org Mon Oct 31 06:04:51 2011 From: mikeb at mitre.org (Brenner, Mike) Date: Mon, 31 Oct 2011 13:04:51 +0000 Subject: [jdom-interest] Opinion Poll: - JDOM2 and minimum-required Java - Java5 or Java6 In-Reply-To: <4EAE7902.5010100@saxonica.com> References: <4EAE005C.6020608@tuis.net> <07476A68-6C3E-432D-AE6C-B557A7AB6561@hoplahup.net> <4EAE7902.5010100@saxonica.com> Message-ID: <264449A6A521A14593F58479F46B34EB0241AA@IMCMBX03.MITRE.ORG> I do not. -----Original Message----- From: jdom-interest-bounces at jdom.org [mailto:jdom-interest-bounces at jdom.org] On Behalf Of Michael Kay Sent: Monday, October 31, 2011 6:32 AM To: jdom-interest at jdom.org Subject: Re: [jdom-interest] Opinion Poll: - JDOM2 and minimum-required Java - Java5 or Java6 * Does anyone have a realistic need to run a future JDOM2 on Java5? I think in the context of a project where people can stick with JDOM1 if they wish, having a dependency on Java 6 seems reasonable at first blush. But... In Saxon we want to support JDOM2. But we also want Saxon to work on Java 5. We don't mind having a restriction that you can't use JDOM2 with Saxon if you're on Java 5. But we've certainly got some extra complexities making this work, e.g. we might need to isolate the code that references JDOM2 into a separate package that is compiled under Java 6 and is not statically referenced from the main Saxon JAR file. So it would definitely be simpler for us and for our users if it all works under Java 5. Similar complexities are likely to apply to many other components that want to integrate JDOM2. From past experience the more complex the application, the harder it is for people to move forward. Typical scenario: the user has a license for Oracle N; upgrading it to Oracle N+1 will cost millions, but Oracle N only runs under Java version J. There's no business justification for spending the millions, so they're stuck with Java J, and everything else they use in the same application then also has to run under Java J. Michael Kay Saxonica _______________________________________________ To control your jdom-interest membership: http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com From jdom at tuis.net Mon Oct 31 06:26:19 2011 From: jdom at tuis.net (Rolf Lear) Date: Mon, 31 Oct 2011 09:26:19 -0400 Subject: [jdom-interest] Opinion Poll: - JDOM2 and minimum-required Java - Java5 or Java6 In-Reply-To: <4EAE005C.6020608@tuis.net> References: <4EAE005C.6020608@tuis.net> Message-ID: <716973aed0ca67fc72c533a66ebaf276@tuis.net> I should add a time-line here. I think I will sit on this for a couple of weeks... Say Friday the 18th - three weeks. At that point I will summarize all the responses... and between now and then I will also see if I can come up with a more detailed list of what the implications for supporting Java5 are... Then we can make a more informed decision. A third option would be to only officially support Java6, but also put together a document on how to make it work with Java5. Rolf On Sun, 30 Oct 2011 21:56:44 -0400, Rolf wrote: > Hi all. > > ... > > So, as a poll: > > * Does anyone have a realistic need to run a future JDOM2 on Java5? > * If so, could you add additional jars to your classpath just to make > JDOM2 work? > * Any comments, suggestions. > > Currently I feel that it is reasonable to set Java6 as a minumum and not > even bother trying to think about Java5 issues... anyone disagree? > > Rolf > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com