[jdom-interest] String vs. StringBuffer in Text class

Thu May 31 14:46:52 PDT 2001

My last e-mail made me think about whether using StringBuffer in the Text class
really buys us anything. The supposed advantage is for reading in large
documents, or more specifically documents that have elements with long text
content strings. For these elements, the parser presumably can't fit the whole
text string in the buffer it passes to characters(), so it has to call us more
than once, with pieces of the text. Using StringBuffer would presumably make
that more efficient.

But - I'm now thinking that using String would almost certainly be better.
First, remember that even StringBuffer still has to grow its internal char
array. So the copy-and-append that would happen with String would still happen
with StringBuffer, just not as often. Also:

- The StringBuffer(String) constructor creates an internal buffer that's 16
chars longer than the string passed in, so in the common case where
characters() is only called once, you've got 32 extra bytes laying around per
Text object. In the uncommon case, where characters() is called repeatedly, you
may end up with much more memory wasted (depending on the input document),
since StringBuffer doubles in size each time it grows.

- In all cases, using a StringBuffer would mean that we'd have to create a new
String for every call to getValue() (though this wouldn't normally copy the
char array, as mentioned in my last message). If we used a String, we could
just return it directly.

- The case that StringBuffer is supposedly good for presumably only happens
when the element's text value is larger that the XML parser's internal buffer,
right? Xerces uses a 16K buffer, I think? It would certainly be nice to be fast
for documents with text content in the hundreds of kilobytes, but I'd think
this is the 10-20% case. And, since StringBuffer still needs to do some
copying, it wouldn't necessarily be much faster in many cases.

So, I suspect that using StringBuffer will almost always result in significant
wasted memory, and will only be faster a small percentage of the time. (Of
course, testing both ways would be the best way to go...)

In either case, we should try to make sure that the character data that we
receive goes from SAXHandler to the Text class with as little copying as
possible. That might mean that Text wants to have a constructor and an append
method that take char arrays.

Alex Rosen
SilverStream Software