[jdom-interest] Content as a Text Node (was JDOM JSR)

Trimmer, Todd todd.trimmer at trizetto.com
Mon May 21 09:32:01 PDT 2001

From: "Brett McLaughlin" <brett at newInstance.com>
This is the second person in as many days that I consider to be reputable to
bring up problems with the String content of an Element being stored as a
String. I also know that Elliotte has the same concern, as do I.

I think the biggest issue with text nodes as non-Strings are the assumptions
about its implementation; as in "It will be like DOM, which sucks in that
area." I agree with this assessment, first. Second, I think we can implement
this in a better way.

For example, I don't propose changing the signatures of getText(),
getTextTrim(), or setText().

What I'd like to do is INTERNALLY store the Strings within a Text class (or
StringContent, or whatever). We just return the equivalent of a getValue()
on the node when the getText() style methods are invoked. However, we can
return this class from a call to getMixedContent(), which makes perfect
sense and helps some of the problems that Philip has come up with in adding
decorators to allow traversal.

I even would go as far as saying that this might significantly reduce the
problems I have run into that made me ask for interfaces or a common base
class. The ability to work within a JDOM context and add a decorator might
just do the trick.

Thoughts? Backlash?


The Node (also known as NodeX) source code I submitted weeks ago does
EXACTLY as described above: All the pre-existing String-based API work the
same even though Text objects are used internally; not to be exposed until
getMixedContent() is called.

From: Jason Hunter <jhunter at apache.org>
1) Addition of another class and the loss of one aspect of JDOM that has
always been very Java-friendly.  People "get it" when they hear "JDOM
uses String instead of some funky Text class".  That direct message is
lost with the addition of a Text class.

Again, a programmer can use Java String with JDOM in every place BUT
Element.getMixedContent(). So it's not as bad as you think.

From: "Alex Rosen" <arosen at silverstream.com>
Yeah... I was starting to come around to this too. You're right, what really
matters is that getText() and setText() take Strings. Whether
returns Strings is pretty irrelevent. You're already got the crappiness of
having to deal with mixed content, so it's not going to be super-elegant
anyway. Besides, now that we have CDATA objects, having Text nodes is even
of a big deal. (Would we still have CDATA objects, or would it just be a
on the Text object specifying whether to use a CDATA section or not?)

Switches on CDATA will make the patterns harder to implement. Does anyone
see mixed content like "<element>text<![CDATA[more text]]>even more
text</element>" ? Now THAT's funky!

From: Jason Hunter <jhunter at collab.net>
The problem is that the number of methods which JDOM objects have in
common or could have in common is close to zero.  Therefore to create a
Node we have to either allow methods which don't make sense for all
objects (like getMixedContent() on Attribute) or we have Node as really
just a marker interface.  Myself, I don't want to see non-sensical
methods because that's one of the core problems with DOM.  And I don't
see sufficient value for Node as just a marker interface for reasons
enumerated before.

Some people propose to get more methods into Node by restricting the set
of JDOM objects that would qualify as Nodes.  That tends to come at the
cost of other people's desire for everything in JDOM to be a Node.

My implementation of Node has NO methods that do not make any sense, for any
subclass of Node. Document.getParent() will always return null, but how can
this be non-sensical? A user expecting an XML tree to have infinite ancestry
is what would be non-sensical! Besides, when calling getParent(), the
programmer's algorithm will be dealing with Nodes generically. The fact that
the Node that returned null just HAPPENED to be a Document is irrelevant.
The user should expect ANY Node as having the potential of returning null
for getParent(). This would be useful for algorithms that traversed tree
fragments whose root element is not attached to a document yet (in case you
were wondering how Element's implementation of getParent() can ever possibly
return null).

My implementation of Node is definitely NOT just a marker interface. I don't
have Attributes as Nodes, since as Elliotte pointed out, many of the other
tree-based XML technologies do not include attributes as nodes either. After
creating a working implementation of a NodeVisitor and a NodeIterator
without the benefit of Attribute being a Node, I didn't see what the big
deal was. At this point, anyone who vehemently believes Attribute should be
a  Node can only persuade me by sumbitting real code to show why.

From: philip.nelson at omniresources.com
I didn't think I'd see myself writing these words but...

I agree completely with this idea.  The api most people use with jdom to get
string content doesn't have to change.  Not having a String in the mixed
content will not affect many existing apps I would think.

Not only do the String APIs not change, I have working code submitted that
proves it!

From: "Brett McLaughlin" <brett at newInstance.com>
It interests me to put a simple implementation
of this together and run some perf. tests with JProbe and such. Right now,
we're all probably only going to be able to guess at the change in
performance and memory. It's nice if we can to actually do some concrete
comparison, no?

How about running some concrete tests on my concrete code :Þ

I'm surprised I have not received any feedback on my Node implementation
source code. I have a working sample of a NodeIterator and NodeVisitor with
that submission as well.

Just remember, Node and Text might not affect 99% of what you use JDOM for
today, but they will allow programmers to extend JDOM from its core in a
much easier and flexible way. You can wait to teach a newcomer to JDOM about
Text until last, since getMixedContent() is an advanced topic anyway. And
Node should DEFINITELY be taught last, since it is for extensions anyway. In
light of this, I fail to see how cries of "Node and Text will make JDOM more
complicated and harder for a Java coder to understand!" can be taken

Todd Trimmer

More information about the jdom-interest mailing list