[jdom-interest] SAXBuilder enhancement request /2

Jason Hunter jhunter at acm.org
Mon Apr 1 17:10:13 PST 2002


My own feeling on this issue is that it'd be nice to have whitespace
handling more customizable because there are times when you just don't
care about surrounding whitespace (a la web.xml files) and would like
not to pay the penalty.

One thing this means is any solution to this problem should be
implementable to be faster and lighter than the current behavior. 
Faster's going to be hard, because it's difficult to know what the
whitespace looks like until you're already done handling it.  The best
we may be able to get is lighter in memory usage.

Note we right now have a setIgnoringElementContentWhitespace() that
tweaks the behavior of ignorable whitespace.  It's modeled after the
JAXP method.  I wouldn't mind an additional set of methods, perhaps one
like Alex's and another like Phil's.  I think Brad's CharacterHandler
idea has the laudible goal of being extensible, but it'll be pretty hard
to have any context on the call to know how to behave when you get the
callback.

A while back I looked at implementing something that did the basic
trim() of all surrounding whitespace, something I think is the obvious
next step beyond ignorable, but I couldn't find a way to implement it
efficiently.

-jh-

Alex Rosen wrote:
> 
> > Alex wrote:
> > >>>I've always wanted an option that would throw away all
> > whitespace from Elements that have child Elements<<<
> > It is not only for Elements that have children, because you
> > may have cases liken
> >     <MyElementList>
> >     </MyElementList>
> > Which is an empty list and should not contain any Text node.
> 
> But how do you know that? Maybe whitespace is significant in this case. For
> example:
> 
> <JavaFormatOptions>
>   <PrettyPrint>true</PrettyPrint>
>   <IndentString>    </IndentString>
> </JavaFormatOptions>
> 
> or
> 
> <UserList>
>   <User>
>     <Username>Alice</Username>
>     <Password>qwerty123</Password>
>   </User>
>   <User>
>     <Username>Bob</Username>
>     <Password>   </Password>
>   </User>
> </UserList>
> 
> My proposal will not lose data for ANY data-oriented (i.e.
> non-mixed-content) XML file. Your proposal would not lose data for 99.9% of
> these XML files, but that .1% is a killer. Especially since sometimes (e.g.
> in the second example above), it'll work fine until you hit an edge case, so
> QA may not find the problem. Not that this second example is a good idea,
> but you get the picture.
> 
> It's true that my proposal will leave unnecessary whitespace in certain
> cases, like empty lists, but that shouldn't really cause an actual problem,
> should it?
> 
> Alex
> 
> _______________________________________________
> To control your jdom-interest membership:
> http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com



More information about the jdom-interest mailing list