[jdom-interest] JDOM and memory

Sat Jan 28 17:41:07 PST 2012

I have now compared the results of string-interning to the String-cache 
code.

The 'raw' code (neither SLimJDOMFactory nor string-interning) is:
2.06MB @ 4.55ms
The SlimJDOMFactory is:
1.57MB @ 8ms
The string-interning SAX Feature is:
2.06MB @ 6.1ms

Not sure how I got essentially zero improvement of memory.... got 
something wrong..... no... been checking, but I think the difference in 
using String.intern on element names only is so insignificant that it 
does not feature as much as 1%.....  perhaps all the dirrerence is 
coming in whitespace....

Not worth checking in to it.... I don't believe the String.itern() is 
the right answer regardless.

Rolf

On 28/01/2012 1:37 PM, Michael Kay wrote:
>
>>
>>
>> Finally, I have in the past had some success with the concept of
>> 'reusing' String values. XML Parsers (like SAX, etc.) typically create
>> a new String instance for all the variables they pass. For example,
>> the Element names, prefixes, etc. are all new instances of String.
>> Thus, if you have hundreds of Elements called 'car' in your input XML,
>> you will get hundreds of different String Element names with the value
>> 'car'. I have built a class that does something similar to
>> String.intern() in order to rationalize the hundreds of
>> different-but-equals() values that are passed in by the parsers.
> Have you measured how your optimization compares with the effect of
> setting the http://xml.org/sax/features/string-interning property on the
> SAX parser?
>
> Are you doing the interning in a way that guarantees strings can be
> compared using "==", and if so, are you taking advantage of this when
> doing the comparisons? .The big win comes with XPath searches such as
> //x. Does the interning introduce any synchronization? (This is the big
> disadvantage with Saxon's NamePool - it speeds up XPath searching
> substantially, but the contention in a highly concurrent workload can
> become quite significant.)
>
> Are you pooling the QName as a whole, or the local name, prefix and URI
> separately?
>
> Michael Kay
> Saxonica
>>
>> I have incorporated this 'caching' class in to a new JDOMFactory
>> called 'SlimJDOMFactory'. This factory 'normalizes' all String values
>> to a single instance of each unique String value. This significantly
>> reduces the amount of memory used in the JDOM tree especially if there
>> are lots of: similarly named attributes, elements, white-space-padding
>> in otherwise empty elements, or between elements. This process is
>> significantly slower through...
>>
>> For example, with the 'hamlet' test case, the 'baseline' memory
>> footprint for hamlet in JDOM is 2.27MB in 4.75ms.
>> With the SlimJDOMFactory it is: 1.77MB in 8.5ms
>> With Lazy AttributeList it is: 2.06MB in 4.55ms
>> With the both it is 1.57MB in 8.3ms
>>
>> I am pushing both of these changes in to github. The AttributeList is
>> an easy one to justify. It is fully compatible with prior code, it has
>> positive memory and perfomance impacts.
>>
>> The SlimJDOMFactory is also justifiable when you consider:
>> 1. the user has to decide to use it specifically.
>> 2. The memory saving can be very significant.
>> 3. Even though the parse time is slower, the GC time savings can be
>> significant if the document 'hangs around' for a long time - the
>> quicker GC time can add up fast.
>> 4. When you have lots of code doing comparisons it is much faster to
>> do equals() calls on Strings that are == as well. It saves a hashCode
>> calculation as well as a string character scan to prove equals().
>>
>> Rolf
>>
>> On 02/01/2012 3:27 PM, Rolf wrote:
>>> Hi all.
>>>
>>> Memory optimization has never been a top priority for JDOM. At the same
>>> time, for what it does, JDOM is not a 'terrible' memory user. Still, I
>>> have done some analysis, and, I believe I can trim about a quarter to a
>>> half of 'JDOM Overhead' memory usage by making two 'simple' changes....
>>>
>>> The first is to merge the ContentList class in to the Element class (and
>>> also in to Document). This will reduce the number of Java objects by
>>> about half, and that will save about 32 bytes per Element at a minimum
>>> in a 64-bit JRE. Additionally, by lazy-initialization of the Content
>>> array, we can save memory on otherwise 'empty' Elements.
>>>
>>> This can be done by extending the Element (and perhaps Document) class
>>> to extend 'List'. It can all be done in a 'backward compatible' way, but
>>> also leads to some interesting possibilities, like:
>>>
>>> for (Content c : element) {
>>> ... do something
>>> }
>>>
>>> (for backward compatibility, Element.getContent() will return 'this').
>>>
>>>
>>> The second change is to make the AttributeList instance in Element a
>>> lazy-initialization. This would save memory on all Elements that have no
>>> attributes, but would have an impact for people who sub-class the
>>> Element class and may expect the attributes field to be non-null.
>>>
>>>
>>> I am trying to get a feel for how important this sort of optimization
>>> may be. If there is interest then I will make some changes, and test the
>>> impact. I may make a separate branch in github to test it out....
>>>
>>> If the above changes are unrealistic then I don't think it makes sense
>>> to even try....
>>>
>>> Rolf
>>> _______________________________________________
>>> To control your jdom-interest membership:
>>> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>>>
>>
>> _______________________________________________
>> To control your jdom-interest membership:
>> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>>
>
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>