[jdom-interest] ElementScanner - causing SAXHandler to mistake nonroot element for root element

Laurent Bihanic laurent.bihanic at atosorigin.com
Wed May 5 04:10:30 PDT 2004


Hi Richard,

Sorry for the long delay. I had a chance to look at your problem. Indeed, this 
is a problem in ElementScanner and your analysis is correct.

 > To fix, I removed the if (this.activeRules.size() != 0) test that contained
 > the startElement() call to XMLScanner, so that it always propogates the
 > event to the SAXHandler.

Your fix proposal to always propagate the startElement events to SAXHandler is 
quite dangerous as it forces SAXHandler to build a full JDOM document from the 
parser output (which is précisely what ElementScanner aims at avoiding).
Thus, I think we should keep the "if (this.activeRules.size() != 0)" test to 
support extracting some nodes from huge document while using as little memory 
as possible.

Attached is another patch proposal: Instead of directly using SAXHandler, it 
relies on a subclass (FragmentHandler, borrowed from JDOMResult) that inserts 
a dummy root document in SAXHandler's document.
This guarantees that, whatever your matching rules, SAXHandler will always 
have a single root document.

What do you think,

Laurent


Richard Allen wrote:
> Hi All,
> 
> With the following XML:
> <blah>
>       <huh>1234</huh>
>       <blam>
>             <yay>woohoo</yay>
>       </blam>
>       <blam>
>             <yay>mwuhahaha</yay>
>       </blam>
>       <nah>5678</nah>
> </blah>
> 
> And listeners on the following:
> /blah/huh
> /blah/blam
> 
> The /blah/huh element is processed sweet as..
> But when the /blah/blam element is being processed, the
> SAXHandler.startElement() throws the following exception:
> 
> org.xml.sax.SAXException: Ill-formed XML document (multiple root elements
> detected)
>       at org.jdom.input.SAXHandler.getCurrentElement(SAXHandler.java:906)
>       at org.jdom.input.SAXHandler.startElement(SAXHandler.java:553)
>       at
> org.jdom.contrib.input.scanner.ElementScanner.startElement(ElementScanner.java:554)
> 
> This is a bit weird, given that the //blam element isn't the root element
> ;-)
> 
> The problem is that the XMLScanner is not being notified until after the
> first element that contains active rules has been found.
> This causes SAXHandler to think that the /blah/huh element is actually the
> root.
> When the ElementScanner notifies SAXHandler of the /blah/blam element it
> throws a hissy fit as it has already ended what it thinks is the root
> element ;-)
> 
> To fix, I removed the if (this.activeRules.size() != 0) test that contained
> the startElement() call to XMLScanner, so that it always propogates the
> event to the SAXHandler.
> 
> Comments appreciated as to whether this fix is the ideal fix, or if there
> is a better way to fix this problem.
> cheers,
> Rich
-------------- next part --------------
Index: ElementScanner.java
===================================================================
RCS file: /home/cvspublic/jdom-contrib/src/java/org/jdom/contrib/input/scanner/ElementScanner.java,v
retrieving revision 1.11
diff -u -r1.11 ElementScanner.java
--- ElementScanner.java	28 Feb 2004 03:47:08 -0000	1.11
+++ ElementScanner.java	5 May 2004 12:53:26 -0000
@@ -707,7 +707,7 @@
       //----------------------------------------------------------------------
 
       protected SAXHandler createContentHandler() {
-         return (new SAXHandler(new EmptyDocumentFactory(getFactory())));
+         return (new FragmentHandler(new EmptyDocumentFactory(getFactory())));
       }
 
       //----------------------------------------------------------------------
@@ -768,6 +768,31 @@
    }
 
    //-------------------------------------------------------------------------
+   // FragmentHandler nested class
+   //-------------------------------------------------------------------------
+
+   /**
+    * FragmentHandler extends SAXHandler to support matching nodes
+    * without a common ancestor. This class inserts a dummy root
+    * element in the being-built document. This prevents the document
+    * to have, from SAXHandler's point of view, multiple root
+    * elements (which would cause the parse to fail).
+    */
+   private static class FragmentHandler extends SAXHandler {
+      /**
+       * Public constructor.
+       */
+      public FragmentHandler(JDOMFactory factory) {
+         super(factory);
+
+         // Add a dummy root element to the being-built document as XSL
+         // transformation can output node lists instead of well-formed
+         // documents.
+         this.pushElement(new Element("root", null, null));
+      }
+   }
+
+   //-------------------------------------------------------------------------
    // EmptyDocumentFactory nested class
    //-------------------------------------------------------------------------
 


More information about the jdom-interest mailing list