[jdom-interest] JDOM parser reuse memory problem
jdom at tuis.net
Tue Nov 22 20:35:23 PST 2011
Hi again everyone.
I have been playing with SAXBuilder, trying to find a way to put it
together in such a way that it is still 'SAXBuilder' but improves parser
reuse, and still enables customization.
I think I have come up with a solution that is backward compatible
for 'everyday' use (but compatibility is broken for people who have
sub-classed either SAXHandler or SAXBuilder - and some methods have been
deprecated on SAXBuilder and others have been renamed with the old names
The performance results are more effective than I expected. See:
where you can see that the code now re-uses the parser and the whole
process completes in a quarter of the previous time.
The changes are hard to describe in one place, but I have put together
some documentation here:
Finally, I have updated the performance page too:
You can see that the SAX builder has now re-taken the lead from the StAX
On Sat, 19 Nov 2011 23:12:49 -0500, Rolf <jdom at tuis.net> wrote:
> Hi all.
> I am looking to run some ideas past the group. I see a number of
> problems with the SAXBuilder as it currently is. It is somewhat hard to
> describe them all, but, the bottom line is that I think the API should
> be changed for it in a smallish way that will affect people who use a
> custom SAXHandler, or those who hard-code a SAXParser Driver classname
> in the SAXBuilder constructor. I believe the vast majority of people use
> the default constructor, and do not subclass the SAXHandler so this
> change will affect only a small subset of JDOM users.
> So, here are the problems I see, in addition to the bug related to
> long-living memory references.
> Problem 1: SAXParser creation
> JDOM uses 3 mechanisms to create a SAX parser:
> 1. if the user specifies a specific SAX 'Driver' classname
> 2. else falls-back to JAXP
> 3. else falls back to a 'default' SAX Driver (xerces)
> I believe that the 'default' fall-back should be removed because if JAXP
> fails there's nothing. At minimum, JAXP will find the parser embedded in
> the Java runtime, and the 'default' fallback will never happen. Put
> another way, if JAXP fails, there is no reason to expect
> that the 'default' "org.apache.xerces.parsers.SAXParser" will work
> (because if you have org.apache.xerces.parsers.SAXParser then you also
> have a working JAXP parser....)
> I also believe the user-specified 'driver' mechanism should be replaced
> with a straight XMLReaderFactory instance. This makes the JDOM user
> responsible for creating the factory. It also adds the ability for the
> user to have just a single Factory instance and not have JDOM creating a
> new instance each time a new SAXBuilder is created. This will give the
> user the opportunity to improve performance that JDOM cannot do.
> XMLReaderFactory is part of SAX2.0 and has been in Java since at least
> Java 1.4. It is the 'correct' way to get an XMLReader instance. Also,
> new JDOM users will not be confused by this string value, wheras
> XMLReaderFactory is a real, standard, and well documented entity.
> Further, there should be no fallback mechanism: if the user manually
> provides a XMLReaderFactory and it fails then it should all fail. If the
> user uses JAXP (the default), and JAXP fails then we fail. In the Java5+
> world JDOM should not need to be 'molly-coddling' the JAXP process.
> Also, we should not be useing such outdated mechanisms as direct SAX
> driver classes.
> This change would 'neaten' up the API for creating SAXBuilders:
> 1. you either use the 'normal' JAXP process, or...
> 2. you use the standard non-JAXP mechanism XMLReaderFactory
> Problem 2: Parser reuse.
> XMLReader reuse is much more efficient than creating a new parser for
> each JDOM build. There have been a few attempts to improve the parser
> reuse in JDOM, but it could be taken even further by only re-configuring
> the XMLReader when the SAXBuilder configuration changes. In a typical
> use where the configuration is unchanged between consecutive JDOM builds
> then there does not need to be any reconfiguration at all.
> Problem 3: The long-linked memory
> The fix for this is probably going to need a 'reset' method on the
> SAXHandler that de-references the Document that was last parsed. This in
> turn will require an API change on SAXHandler.
> Problem 4: SAXHandler sub-classing
> SAXHandler subclassing allows for custom event handling, but, in order
> to use a custom SAXHandler you also have to subclass SAXBuilder and
> override the createContentHandler() method. This is a cumbersome (and
> not well documented) mechanism.
> What with these (at least) 4 issues with SAXBuilder it makes sense to
> change the API slightly to accomodate the 'new' way of doing things.
> This will impact the way that subclassing is done, and will impact those
> who use a non-JAXP SAX parser.
> If these changes (or others like them) need to happen (and I think they
> do), then it makes sense to do it right, and comprehensively.
> I am going to play with the code a little to get an idea of what can be
> done, but I am looking for any ideas, suggestions, criticisms.
> I have already made some changes affecting the JDOM2 API but I think
> this could be one of those changes that makes a real difference (for the
> On 18/11/2011 7:32 PM, Rolf wrote:
>> I have updated the issue with some performanc numbers for some
>> Have a look at: https://github.com/hunterhacker/jdom/issues/52
>> It seems to indicate that fixing the 'back to raw JAXP for each loop'
>> will only save a little time, but parser reuse saves a lot.
>> Need to implement both options, I think, implement SAXFactory caching
>> well as better memory management on Parser reuse.
>> Out of interest, I thought the default setting for parser reuse was
>> 'false', but it is true. XMLReaders will be reused unless you
>> This in turn means that my comments about 'normal' process should be
>> reversed, the normal case for this bug condition is that we keep a
>> reference from the SAXBuilder to the Document for as long as the
>> SAXBuilder is active, and not used to rebuild another document.
>> To control your jdom-interest membership:
> To control your jdom-interest membership:
More information about the jdom-interest