[jdom-interest] skipping a huge text node

Tobias Thierer lists-2006 at tobias-thierer.de
Mon Jun 19 23:49:26 PDT 2006


Hi,

I am trying to parse a very large XML document, 99% of which consists of one
huge text node:

  <sequence>ACGGAAAT[...]</sequence>

which is too large to fit into memory. So instead of getting the whole
String returned by the parser (which won't work because it doesn't fit into
memory), I'd like to get just the length of the string and its offset in the
XML file, so that whenever I want to access parts of the sequence, I can
seek to the correct position and read just the substring that I am
interested in.

Is it somehow possible to tell jdom to consume the text node and reporting
its offset in the file and its length, rather than storing it in memory?

I've looked at jdom-contrib which provides an ElementListener interface, but
that one's elementMatched() method is only called *after* the element
(including the close tag) has been fully read. All the classes like
SAXBuilder etc. only seem to handle events that come from the parser, but
what I want to do is change the events that the parser reports.

Is there any chance to do this with jdom(-contrib)? If not, do you know of
any other XML parser with which I could do that?

Cheers,

  Tobias



More information about the jdom-interest mailing list