[jdom-interest] JDOM parser reuse memory problem

Rolf jdom at tuis.net
Tue Nov 22 20:35:23 PST 2011


Hi again everyone.

I have been playing with SAXBuilder, trying to find a way to put it
together in such a way that it is still 'SAXBuilder' but improves parser
reuse, and still enables customization.

I think I have come up with a solution that is backward compatible
for 'everyday' use (but compatibility is broken for people who have
sub-classed either SAXHandler or SAXBuilder - and some methods have been 
deprecated on SAXBuilder and others have been renamed with the old names 
deprecated too).

The performance results are more effective than I expected. See: 
https://github.com/hunterhacker/jdom/issues/52#issuecomment-2844750
where you can see that the code now re-uses the parser and the whole 
process completes in a quarter of the previous time.

The changes are hard to describe in one place, but I have put together 
some documentation here: 
http://hunterhacker.github.com/jdom/jdom2/apidocs/org/jdom2/input/sax/package-summary.html#package_description

Finally, I have updated the performance page too: 
http://hunterhacker.github.com/jdom/jdom2/performance.html

You can see that the SAX builder has now re-taken the lead from the StAX 
prcess.

Thanks

Rolf


On Sat, 19 Nov 2011 23:12:49 -0500, Rolf <jdom at tuis.net> wrote:
> Hi all.
>
> I am looking to run some ideas past the group. I see a number of
> problems with the SAXBuilder as it currently is. It is somewhat hard to
> describe them all, but, the bottom line is that I think the API should
> be changed for it in a smallish way that will affect people who use a
> custom SAXHandler, or those who hard-code a SAXParser Driver classname
> in the SAXBuilder constructor. I believe the vast majority of people use

> the default constructor, and do not subclass the SAXHandler so this
> change will affect only a small subset of JDOM users.
>
> So, here are the problems I see, in addition to the bug related to
> long-living memory references.
>
>
> Problem 1: SAXParser creation
>
> JDOM uses 3 mechanisms to create a SAX parser:
> 1. if the user specifies a specific SAX 'Driver' classname
> 2. else falls-back to JAXP
> 3. else falls back to a 'default' SAX Driver (xerces)
>
> I believe that the 'default' fall-back should be removed because if JAXP

> fails there's nothing. At minimum, JAXP will find the parser embedded in

> the Java runtime, and the 'default' fallback will never happen. Put
> another way, if JAXP fails, there is no reason to expect
> that the 'default' "org.apache.xerces.parsers.SAXParser" will work
> (because if you have org.apache.xerces.parsers.SAXParser then you also
> have a working JAXP parser....)
>
> I also believe the user-specified 'driver' mechanism should be replaced
> with a straight XMLReaderFactory instance. This makes the JDOM user
> responsible for creating the factory. It also adds the ability for the
> user to have just a single Factory instance and not have JDOM creating a

> new instance each time a new SAXBuilder is created. This will give the
> user the opportunity to improve performance that JDOM cannot do.
> XMLReaderFactory is part of SAX2.0 and has been in Java since at least
> Java 1.4. It is the 'correct' way to get an XMLReader instance. Also,
> new JDOM users will not be confused by this string value, wheras
> XMLReaderFactory is a real, standard, and well documented entity.
>
> Further, there should be no fallback mechanism: if the user manually
> provides a XMLReaderFactory and it fails then it should all fail. If the

> user uses JAXP (the default), and JAXP fails then we fail. In the Java5+

> world JDOM should not need to be 'molly-coddling' the JAXP process.
> Also, we should not be useing such outdated mechanisms as direct SAX
> driver classes.
>
> This change would 'neaten' up the API for creating SAXBuilders:
> 1. you either use the 'normal' JAXP process, or...
> 2. you use the standard non-JAXP mechanism XMLReaderFactory
>
>
> Problem 2: Parser reuse.
>
> XMLReader reuse is much more efficient than creating a new parser for
> each JDOM build. There have been a few attempts to improve the parser
> reuse in JDOM, but it could be taken even further by only re-configuring

> the XMLReader when the SAXBuilder configuration changes. In a typical
> use where the configuration is unchanged between consecutive JDOM builds

> then there does not need to be any reconfiguration at all.
>
>
> Problem 3: The long-linked memory
>
> The fix for this is probably going to need a 'reset' method on the
> SAXHandler that de-references the Document that was last parsed. This in

> turn will require an API change on SAXHandler.
>
> Problem 4: SAXHandler sub-classing
>
> SAXHandler subclassing allows for custom event handling, but, in order
> to use a custom SAXHandler you also have to subclass SAXBuilder and
> override the createContentHandler() method. This is a cumbersome (and
> not well documented) mechanism.
>
>
>
> What with these (at least) 4 issues with SAXBuilder it makes sense to
> change the API slightly to accomodate the 'new' way of doing things.
> This will impact the way that subclassing is done, and will impact those

> who use a non-JAXP SAX parser.
>
> If these changes (or others like them) need to happen (and I think they
> do), then it makes sense to do it right, and comprehensively.
>
> I am going to play with the code a little to get an idea of what can be
> done, but I am looking for any ideas, suggestions, criticisms.
>
> I have already made some changes affecting the JDOM2 API but I think
> this could be one of those changes that makes a real difference (for the

> better).
>
> Rolf
>
>
> On 18/11/2011 7:32 PM, Rolf wrote:
>> I have updated the issue with some performanc numbers for some
different
>> conditions.
>>
>> Have a look at: https://github.com/hunterhacker/jdom/issues/52
>>
>> It seems to indicate that fixing the 'back to raw JAXP for each loop'
>> will only save a little time, but parser reuse saves a lot.
>>
>> Need to implement both options, I think, implement SAXFactory caching
as
>> well as better memory management on Parser reuse.
>>
>> Out of interest, I thought the default setting for parser reuse was
>> 'false', but it is true. XMLReaders will be reused unless you
explicitly
>> setReuseParser(false);
>>
>> This in turn means that my comments about 'normal' process should be
>> reversed, the normal case for this bug condition is that we keep a
>> reference from the SAXBuilder to the Document for as long as the
>> SAXBuilder is active, and not used to rebuild another document.
>>
>> Thanks
>>
>> Rolf
>> _______________________________________________
>> To control your jdom-interest membership:
>> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>>
>
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com


More information about the jdom-interest mailing list