[jdom-interest] AttributeList performance

Mon Apr 25 11:47:22 PDT 2005

--- Elliotte Harold <elharo at metalab.unc.edu> wrote:
> Martin Schulz wrote:
> 
...
> > There might well be an implementation trade-off
> where the current 
> > default (array of 5 Attributes)
> > is the better choice compared to adding a
> datastructure with a bigger 
> > footprint, so I'd
> > be curious about what others might deem
> appropriate in this situation.

There is also a related issue of efficient building of
a DOM tree from the actual XML-parser. I think one
place where things could be optimized (if I recall
correctly) would be to to properly pre-allocate
"right" size attribute and child element lists, when
building the tree bottom-up. That way attribute and
child counts are fully known, and no resizes are
needed (extra slack can still be left if it's assumed
additions are likely to be done etc).
I didn't work on this when writing StAXBuilder, but
remember considering it, since profiler did show list
handling for attributes and elements consuming
surprisingly big chunk of time.

> My strong suspicion is that in the more normal use
> cases where elements 
> have half a dozen attributes at most, or less,
> ArrayLists are much more 
> efficient than HashMaps. You may have crossed the
> threshold where the 
> bigger constant factor of a HashMap vs. a list makes
> the HashMap faster, 
> so a custom subclass might make sense for you.
> However, I doubt it makes 
> sense for the majority of users.

One interesting thing is that many low-level parsers
are likely to implement a hash-structure, to detect
duplicate attributes, at low-level. It is possible to
implement a nicely performing light-weight
hash-structure, when implementation can guarantee
things like String interning (which most if not all
parsers do for names), and need not be as versatile as
generic java.util.Map implementations (only used
internally). Although one could theoretically use
simple Lists, that choice is less compelling since it
leads to n^2 complexity (each attribute needs to be
checked for name uniqueness against preceding ones).

What this means is that chances are that the low-level
parser already has data struct(s) necessary for fast
access, but there's often no way to percolate that
information up via standard APIs (neither SAX nor StAX
specify a way to do that; there'd be problems
guaranteeing immutability and other problems).

Anyway, it's kind of ironic that StAX-parsers may have
faster access to attribute information than, say,
JDOM, and that to get fast generic access (even for
just 5 attributes performance difference is actually
significant for the operation itself, although
hopefully not for total application performance) at
higher level, one needs to reconstruct fast lookup.

Just my 2c from low-level perspective,

-+ Tatu +-

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com