[jdom-interest] Converting elements from a SAX stream to JDOM elements

jdom jdom at tuis.net
Sat Jul 18 08:47:48 PDT 2009


Colin.

My instinct would be to investigate a different approach...

Perhaps a mechanism similar to how ZipInputStreams work, where the
stream can be read as separate streams for each 'element'.

Build a 'tee' or 'branched' custom InputStream between the main sax
parser, and the underlying 'infinite' stream. This intermediate stream
can be used to feed 'child' streams to the JDOM's sax parser, but use a
standard sax parser to terminate the child stream using a mechanism
similar to what you described below.

This way you have just one 'infinite' stream, and you feed the contents
to one 'global' parser which implements 'break logic' on a seperate
version of the stream which feeds JDOM. When the end of the element is
encountered in the main stream it causes the JDOM stream to reach 'end
of file', and the JDOM side of things can then open a new 'child' stream
for the next 'document'.

No (little) memory overhead. No need to buffer complete documents, etc.

InputStreams are relatively simple to implement ;-)

Rolf



Colin Horne wrote:
> Hello,
>
> I have a long (infinite) XML stream, which I intend to parse with SAX.
> Each individual element in the stream is small, and should be parsed
> with JDOM:
>
> <stream>
> <element>...</element>
> <element>...</element>
> </stream>
>
> So each <element> (and their children) are parsed with JDOM, but the
> <stream> as a whole is parsed with SAX. It would be preferable if each
> <element> does not have to be serialized to an encoded string, and if
> elements are not processed twice (e.g., using SAX to echo the
> <element>'s XML to a stream, which is then read by JDOM).
>
> I've found several references to this problem from the past, but could
> not find a complete solution.
>
> My initial approach was to use the SAXHandler, like so:
>
>         jdomHandler = new SAXHandler() {
>             public void endElement(String uri, String localName, String qName) {
>                 if (qName.equals("an element which I want JDOM to parse")) {
>                     // change the SAX handler to myHandler
>                 } else {
>                     super.endElement(uri, localName, qName);
>                 }
>             }
>         };
>
>
>         myHandler = new DefaultHandler() {
>             public void startElement(String uri, String localName,
> String qName, Attributes attributes) {
>                 if (qName.equals("an element which I want JDOM to parse")) {
>                     // change the SAX handler to jdomHandler
>                 }
>             }
>         };
>
>
> (Ignoring for now that the endElement() method needs to keep track of
> its nesting level)
>
> However, trivially doing the above does not work, and fails after
> calling jdomHandler.getDocument():
>
> Exception in thread "main" java.lang.IllegalStateException: Root element not set
> 	at org.jdom.Document.getRootElement(Document.java:218)
>
>
> I've looked at the initalization code that JDOM is normally doing in
> the SAXBuilder.build() method, and am reluctant to copy/modify the
> code, because I suspect it will break with future releases, and can't
> help but wonder if it would be over-complicating things.
>
> Is there a Right Way(TM) to do this? If so, I might also suggest that
> it's referenced from the FAQ.
>
> Many thanks,
>   Colin
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>
>   




More information about the jdom-interest mailing list