[jdom-interest] skipping a huge text node

Tue Jun 20 09:06:40 PDT 2006

--- Mattias Jiderhamn <mj-lists at expertsystems.se>
wrote:

> To me it sounds like you are not using the file the
> way XML was 
> intended, and possibly, you shouldn't try to parse

Or perhaps even that the XML structure is incorrectly
defined, and shouldn't have such huge text nodes.
It's also good to note that independent of the size of
individual text nodes, JDom by default always builds
the full document, so even though the underlying
parser may report bigger text as smaller chunks, this
doesn't necessarily help a lot, unless builder
discards pieces.

However:

> with an XML parser 
> but rather a proprietary "data file" parser.
> (See below)
> 
> At 2006-06-20 08:49, Tobias Thierer wrote:
...
> Please note that JDOM is not itself a parser, but
> uses the underlying 
> SAX parser, such as Xerces. If you are looking into
> customized 
> parsing, you better look at the underlying parser. 

Exactly. JDom implements the document tree model, and
defers parsing (and optionally, deserialization) to a
low-level streaming parser.
Besides SAX parsers, you can also have look at StAX
parser, since for these kinds of tasks, it may be an
even better fit. There is a Stax-based JDom builder
available at:

http://woodstox.codehaus.org/StaxMisc

(which is a sub-project of Woodstox Stax XML
processor)

Or, you may choose to leave JDom out altogether, for
this special case. Handling of huge XML files is one
area where many standard processing components
(DOM/JDom/Dom4j, XSL) may not be usable, and you need
to use more lower level methods (or perhaps streaming
XQuery)

-+ Tatu +-

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com