[jdom-interest] Content as a Text Node (was JDOM JSR)

Brett McLaughlin brett at newInstance.com
Fri May 18 11:19:19 PDT 2001

This is the second person in as many days that I consider to be reputable to
bring up problems with the String content of an Element being stored as a
String. I also know that Elliotte has the same concern, as do I.

I think the biggest issue with text nodes as non-Strings are the assumptions
about its implementation; as in "It will be like DOM, which sucks in that
area." I agree with this assessment, first. Second, I think we can implement
this in a better way.

For example, I don't propose changing the signatures of getText(),
getTextTrim(), or setText().

What I'd like to do is INTERNALLY store the Strings within a Text class (or
StringContent, or whatever). We just return the equivalent of a getValue()
on the node when the getText() style methods are invoked. However, we can
return this class from a call to getMixedContent(), which makes perfect
sense and helps some of the problems that Philip has come up with in adding
decorators to allow traversal.

I even would go as far as saying that this might significantly reduce the
problems I have run into that made me ask for interfaces or a common base
class. The ability to work within a JDOM context and add a decorator might
just do the trick.

Thoughts? Backlash?


> -----Original Message-----
> From: jdom-interest-admin at jdom.org
> [mailto:jdom-interest-admin at jdom.org]On Behalf Of Amy Lewis
> Sent: Friday, May 18, 2001 7:04 AM
> To: jdom-interest at jdom.org
> Subject: Re: [jdom-interest] JDOM JSR
> On Thu, May 17, 2001 at 10:41:53PM -0500, Brett McLaughlin wrote:
> >  Could you sum up your main concerns about scalability? How do factories
> >relate to that? Would the subclassing and specifying an implementation to
> >use to a builder suffice? I'd like to hear what you see as the
> problems you
> >have run into.
> All right.  Quickly this morning; if I'm unclear, ask, and I'll try to
> clarify this evening.
> JDOM, in my experience, has three major goals: ease of use, lightweight
> default implementation, and well-formedness verification.  Of these,
> lightweight and easy tend to reinforce each other; well-formedness
> provides a tension in the direction of greater complexity, but tends to
> reinforce the ease-of-use issue as well.
> The main difficulty that I see with the current implementation is the
> lack of a generic (oh, no, she's going to *say* it!  Break out the
> asbestos frillies!), unifying, 'node' interface for all classes that
> participate in the tree.  This is, in fact, largely bearable, except in
> one case: String as node.
> For all other classes, a custom implementation can do the work of
> defining the extensibility mechanism (that is, of defining the shared
> interface).  XML is a little odd: it is generally the case that if the
> particular node you're handed isn't a 'branch', then its parent is--all
> nodes are one step away from a crossroad.
> The particular implementations that drive this need have typically
> needed to modify both structure and content of the document being
> processed, in multiple ways.  The mechanism is often methods with a
> relatively simple signature (using DOM): Document doSomething(Document,
> Node) (it can be further simplified to void doSomething(Document,
> Node), but that's kinda poor style, and sometimes the return value is
> non-void, non-document, and the Document parameter may be changed).
> Sometimes the signature is just doSomething(Node), if the model isn't
> pipeline, but hub (which determines transformations and order of
> transformation) and spoke (what would be filters in the pipeline
> model).
> Using a Builder, I can decorate implementation classes (subclasses)
> without too much trouble.  Except for String.  Text nodes end up
> special-cased; the developers have to be warned to treat them
> completely differently (pass the parent, not the node that you care
> about, and maybe do a search to find the part that you care about, if
> there are multiple children).  Note that this doesn't require, but does
> encourage, the subclasser to create that unifying node interface, even
> if it only contains "getParent()" (and a test of some sort, perhaps
> instanceof, to see whether there are other available axes--there always
> will be for the parent of the given node, if one exists).
> A part of the concern is driven by the cost of xpathery--using internal
> APIs, simple XPaths (developers can be restricted to a subset of
> "cheap" XPath expressions) are *fast*.  Instantiating Xalan *isn't*;
> even descendant-or-self::node()/*[1] munches tens and hundreds of
> milliseconds.
> Summary: I'm not calling for the creation of a heavyweight API; one
> already exists.  I want a lightweight API *that can be extended*.
> Perhaps the extension will make it very heavy (lopsided?  :-) in one
> direction (memory, speed, complexity of the decorators); JDOM should
> not *prevent* that in the name of any of its goals, if it's possible
> not to.
> Right now, the chief impediments are the lack of a unifying interface
> (meaning that the implementor prolly has to define something, which
> means the implementor has to understand the API fully), and the
> impossibility of unifying String into a Node interface.  I understand
> the arguments against defining the interface ... but reject them; I
> have no problems with even marker interfaces, that contain no methods
> and really only provide an instanceof test.  But everything breaks on
> the rock of String.
> I realize that the choice of String is intended to make things faster
> (note that this is not always true; when there's a lot of content
> mangling going on in the document, each change creates at least one
> additional String object, and management of the problem rapidly becomes
> one of the major profiling issues) and lighter, but again, I don't
> accept the argument.  Equally good effect could probably be achieved
> (for instance) by storing char [], with String getValue() and
> StringBuffer getValueBuffer() and corresponding mutators ... on a
> "Text" or "Chars" node, not on Element.
> As a final note: about nine months ago, after I made a nuisance of
> myself, management sent one of the more-senior architects to look at
> JDOM (I was getting really sick of DOM, and even sicker of some of the
> Java==Perl string manipulation tricks that some others were doing to
> try to reduce DOMishness).  The critique was a one-liner: "It's fine
> for reading configuration files."  Actually, there was more, but that
> was the main substance; JDOM hasn't been something that can be
> customized, because it's specifically optimized, in several ways, for
> reading (or for static construction: build it once, don't change it
> afterwards).
> Hope that helps,
> Amy!
> --
> Amelia A. Lewis         alicorn at mindspring.com
> amyzing at talsever.com
> I don't know that I ever wanted greatness, on its own.  It seems
> rather like
> wanting to be an engineer, rather than wanting to design something--or
> wanting to be a writer, rather than wanting to write.  It should be a
> by-product, not a thing in itself.  Otherwise, it's just an ego trip.
>                 -- Merlin, son of Corwin, Prince of Chaos (Roger Zelazny)
> _______________________________________________
> To control your jdom-interest membership:
> http://lists.denveronline.net/mailman/options/jdom-interest/yourad
dr at yourhost.com

More information about the jdom-interest mailing list