[jdom-interest] ElementScanner - causing SAXHandler to mistake nonroot element for root element

Richard Allen rallen at tacitgroup.com
Wed May 5 04:35:09 PDT 2004





Hi Laurent,

no worries about the delay, we are all busy people ;-)

Your solution sounds good.
I just checked the activeRules hashmap gets entries removed in
endElement().. otherwise we would have ended up building all the elements
after the first match was found... It does get cleaned up so this patch
should give a nice solution.

Doesn't this warrant a case for moving the FragmentHandler class from being
an inner class (inside JDOMResult) into a class in its own right?

Ryan Cox will be very interested in this solution... as he applied the last
patch but noticed mem/performance problems ... probably due to the fact
that a JDOM doc was being built for the whole document, which was very
large!

sorry about that Ryan ;-) hehe

cheers,
Rich


Laurent Bihanic <laurent.bihanic at atosorigin.com> wrote on 05/05/2004
11:10:30 PM:

> Hi Richard,

> Sorry for the long delay. I had a chance to look at your problem. Indeed,
this
> is a problem in ElementScanner and your analysis is correct.

> > To fix, I removed the if (this.activeRules.size() != 0) test that
contained
> > the startElement() call to XMLScanner, so that it always propogates the
> > event to the SAXHandler.
>
> Your fix proposal to always propagate the startElement events to
SAXHandler is
> quite dangerous as it forces SAXHandler to build a full JDOM document
from the
> parser output (which is précisely what ElementScanner aims at avoiding).
> Thus, I think we should keep the "if (this.activeRules.size() != 0)" test
to
> support extracting some nodes from huge document while using as little
memory
> as possible.

> Attached is another patch proposal: Instead of directly using SAXHandler,
it
> relies on a subclass (FragmentHandler, borrowed from JDOMResult) that
inserts
> a dummy root document in SAXHandler's document.
> This guarantees that, whatever your matching rules, SAXHandler will
always
> have a single root document.

> What do you think,

> Laurent

>
> Richard Allen wrote:
> > Hi All,
> >
> > With the following XML:
> > <blah>
> >       <huh>1234</huh>
> >       <blam>
> >             <yay>woohoo</yay>
> >       </blam>
> >       <blam>
> >             <yay>mwuhahaha</yay>
> >       </blam>
> >       <nah>5678</nah>
> > </blah>
> >
> > And listeners on the following:
> > /blah/huh
> > /blah/blam
> >
> > The /blah/huh element is processed sweet as..
> > But when the /blah/blam element is being processed, the
> > SAXHandler.startElement() throws the following exception:
> >
> > org.xml.sax.SAXException: Ill-formed XML document (multiple root
elements
> > detected)
> >       at
org.jdom.input.SAXHandler.getCurrentElement(SAXHandler.java:906)
> >       at org.jdom.input.SAXHandler.startElement(SAXHandler.java:553)
> >       at
> > org.jdom.contrib.input.scanner.ElementScanner.
> startElement(ElementScanner.java:554)
> >
> > This is a bit weird, given that the //blam element isn't the root
element
> > ;-)
> >
> > The problem is that the XMLScanner is not being notified until after
the
> > first element that contains active rules has been found.
> > This causes SAXHandler to think that the /blah/huh element is actually
the
> > root.
> > When the ElementScanner notifies SAXHandler of the /blah/blam element
it
> > throws a hissy fit as it has already ended what it thinks is the root
> > element ;-)
> >
> > To fix, I removed the if (this.activeRules.size() != 0) test that
contained
> > the startElement() call to XMLScanner, so that it always propogates the
> > event to the SAXHandler.
> >
> > Comments appreciated as to whether this fix is the ideal fix, or if
there
> > is a better way to fix this problem.
> > cheers,
> > Rich
>
> Index: ElementScanner.java
> ===================================================================
> RCS file: /home/cvspublic/jdom-
> contrib/src/java/org/jdom/contrib/input/scanner/ElementScanner.java,v
> retrieving revision 1.11
> diff -u -r1.11 ElementScanner.java
> --- ElementScanner.java 28 Feb 2004 03:47:08 -0000 1.11
> +++ ElementScanner.java 5 May 2004 12:53:26 -0000
> @@ -707,7 +707,7 @@
> //----------------------------------------------------------------------

> protected SAXHandler createContentHandler() {
> -         return (new SAXHandler(new
EmptyDocumentFactory(getFactory())));
> +         return (new FragmentHandler(new
> EmptyDocumentFactory(getFactory())));
> }

> //----------------------------------------------------------------------
> @@ -768,6 +768,31 @@
> }

>
//-------------------------------------------------------------------------
> +   // FragmentHandler nested class
> +
>
//-------------------------------------------------------------------------
> +
> +   /**
> +    * FragmentHandler extends SAXHandler to support matching nodes
> +    * without a common ancestor. This class inserts a dummy root
> +    * element in the being-built document. This prevents the document
> +    * to have, from SAXHandler's point of view, multiple root
> +    * elements (which would cause the parse to fail).
> +    */
> +   private static class FragmentHandler extends SAXHandler {
> +      /**
> +       * Public constructor.
> +       */
> +      public FragmentHandler(JDOMFactory factory) {
> +         super(factory);
> +
> +         // Add a dummy root element to the being-built document as XSL
> +         // transformation can output node lists instead of well-formed
> +         // documents.
> +         this.pushElement(new Element("root", null, null));
> +      }
> +   }
> +
> +
>
//-------------------------------------------------------------------------
> // EmptyDocumentFactory nested class
>
//-------------------------------------------------------------------------




More information about the jdom-interest mailing list