[jdom-interest] AttributeList performance
cowtowncoder at yahoo.com
Fri Apr 29 11:17:18 PDT 2005
--- Michael Kay <mike at saxonica.com> wrote:
Michael, thanks for very interesting comments. I fully
agree in that it all depends on how things are used.
Couple of comments:
> But it does depend very much on usage patterns. In
> one application with a
> Generally in this kind of application one needs
> dynamic, self-adjusting data
> structures. Saxon has gradually moved in that
> direction. For example, it
> adds "prior pointers" (previous sibling pointers) to
> a tree the first time
> someone attempts reverse navigation. Recently I've
Yes, and such lazy construction of helper data structs
very often useful (even without structs that are
dynamic... but obviously the two can be combined).
In case of JDOM's attributes, for example,
one could also consider just using raw Attribute array
created when constructed by a parser. That array could
encapsulated as a simple List (either the default
or a simple implementation... alas, default ArrayList
make an unnecessary extra copy since it can not
caller wouldn't mess with the array), if and when
Likewise, a simple index could be constructed if there
attributes and name-based-accesses to warrant that.
Or alternatively, one could also create an efficient
by putting next pointers in Attribute class itself,
and do without
array to contain the instances. It would obviously
lead to worse
indexed access, but not much so for named access...
and would save
some memory, and make insertions faster.
So many possibilities. :-)
> Saxon does linear
> searching for attributes (and for child elements),
Actually, the use case for attributes and elements is
different, since element ordering is strictly linear
definition order matters), whereas attribute access is
is implementation dependant (if one exists).
And like I said, at low-level (for actual parser), I
do think that
hash-like structure is superior to linear lists
(linked or array)
for attribute storage, due to the need to check for
But not so for elements.
> but if I wanted something
> better, then I would look to some kind of dynamic
> approach that adjusts to
> the actual volumetrics and usage patterns on the
> data. You can't allow
> solutions for the worst-case scenarios to impose
> costs on the common case,
You can, although you probably should not if such a
is significant (some companies are known to do just
though, like Oracle and their very conservative
This is of course assuming there is significant
cost for the common case.
But just to point out that "lots of attributes" use
case is not
a rare freak case, one can go and check out OpenOffice
format. It makes heavy use of (namespaced) attributes.
it's probably still in minority, it's not
And it's something I had to work on a lot during my
> and providing lots of user tuning options doesn't
> work unless you want a heavy support load.
I'm not a fan of tons of low-level tuning options, but
I do like the idea
of profiles that indicate what user wants to optimize
usage, optimal performance for long-running processes
and so on
are goals towards which implementation could aim by
implementation classes. And of those profile, use the
profile as the default. Although you can still expose
switches, they would seldom be used directly.
This does not preclude use of more dynamic data
structs and algorithms;
but it can give useful hints for those as well.
In this particular case even that may be an overkill;
whoever needs the specialized solution should write
implementation (like was suggested earlier). But I
think it's still
good to discuss potential approaches and pitfalls so
to do just that.
Anyway, it has been a useful discussion, although I
even have that much use for a solution right now. ;-D
-+ Tatu +-
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
More information about the jdom-interest