[jdom-interest] AttributeList performance

Fri Apr 29 11:17:18 PDT 2005

--- Michael Kay <mike at saxonica.com> wrote:
> 

Michael, thanks for very interesting comments. I fully
agree in that it all depends on how things are used.

Couple of comments:

> But it does depend very much on usage patterns. In
> one application with a
...
> Generally in this kind of application one needs
> dynamic, self-adjusting data
> structures. Saxon has gradually moved in that
> direction. For example, it
> adds "prior pointers" (previous sibling pointers) to
> a tree the first time
> someone attempts reverse navigation. Recently I've

Yes, and such lazy construction of helper data structs
is
very often useful (even without structs that are
themselves
dynamic... but obviously the two can be combined).
In case of JDOM's attributes, for example,
one could also consider just using raw Attribute array
being
created when constructed by a parser. That array could
be
encapsulated as a simple List (either the default
wrapper,
or a simple implementation... alas, default ArrayList
would
make an unnecessary extra copy since it can not
guarantee
caller wouldn't mess with the array), if and when
needed.
Likewise, a simple index could be constructed if there
are enough
attributes and name-based-accesses to warrant that.

Or alternatively, one could also create an efficient
linked list,
by putting next pointers in Attribute class itself,
and do without
array to contain the instances. It would obviously
lead to worse
indexed access, but not much so for named access...
and would save
some memory, and make insertions faster.

So many possibilities. :-)

...
> Saxon does linear
> searching for attributes (and for child elements),

Actually, the use case for attributes and elements is
significantly
different, since element ordering is strictly linear
(since by
definition order matters), whereas attribute access is
not; ordering
is implementation dependant (if one exists).

And like I said, at low-level (for actual parser), I
do think that
hash-like structure is superior to linear lists
(linked or array)
for attribute storage, due to the need to check for
name collisions.
But not so for elements.

> but if I wanted something
> better, then I would look to some kind of dynamic
> approach that adjusts to
> the actual volumetrics and usage patterns on the
> data. You can't allow
> solutions for the worst-case scenarios to impose
> costs on the common case,

You can, although you probably should not if such a
cost
is significant (some companies are known to do just
that,
though, like Oracle and their very conservative
optimization
choices).
This is of course assuming there is significant
additional base
cost for the common case.

But just to point out that "lots of attributes" use
case is not
a rare freak case, one can go and check out OpenOffice
file
format. It makes heavy use of (namespaced) attributes.
While
it's probably still in minority, it's not
insignificant one.
And it's something I had to work on a lot during my
last job.

> and providing lots of user tuning options doesn't
> work unless you want a heavy support load.

I'm not a fan of tons of low-level tuning options, but
I do like the idea
of profiles that indicate what user wants to optimize
for. Low-memory
usage, optimal performance for long-running processes
and so on
are goals towards which implementation could aim by
using suitable
implementation classes. And of those profile, use the
middle-of-the-road
profile as the default. Although you can still expose
the underlying
switches, they would seldom be used directly.
This does not preclude use of more dynamic data
structs and algorithms;
but it can give useful hints for those as well.

In this particular case even that may be an overkill;
and perhaps
whoever needs the specialized solution should write
specialized
implementation (like was suggested earlier). But I
think it's still
good to discuss potential approaches and pitfalls so
it's easier
to do just that.

Anyway, it has been a useful discussion, although I
don't personally
even have that much use for a solution right now. ;-D

-+ Tatu +-

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com