[jdom-interest] ElementScanner and Memory

Wed Nov 15 08:49:06 PST 2006

Laurent,

Commenting out those lines results in an exception:

Exception in thread "main" org.xml.sax.SAXException: Ill-formed XML document (multiple root elements detected)
    at org.jdom.input.SAXHandler.getCurrentElement(SAXHandler.java:918)
    at org.jdom.input.SAXHandler.startElement(SAXHandler.java:556)
    at org.jdom.contrib.input.scanner.ElementScanner.startElement(ElementScanner.java:548)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(AbstractSAXParser.java:533)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:330)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1693)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:368)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:834)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:148)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1242)
    at org.xml.sax.helpers.XMLFilterImpl.parse(XMLFilterImpl.java:333)
    at org.jdom.contrib.input.scanner.ElementScanner.parse(ElementScanner.java:442)

It looks like that causes SaxHandler to add multiple root elements to the document.  I tried tracking this down but had no luck finding the problem and don't have more time to dedicate to it.  I think it has something to do with the isRoot flag in the SaxHandler not being reset.

-Brian

----- Original Message ----
From: Laurent Bihanic <laurent.bihanic at atosorigin.com>
To: Brian Nahas <briannahas at yahoo.com>
Cc: jdom-interest at jdom.org
Sent: Tuesday, November 14, 2006 6:53:40 PM
Subject: Re: [jdom-interest] ElementScanner and Memory

Hi,

Yes, this looks like a bug.

ElementScanner relies on the EmptyDocument and EmptyDocumentFactory nested 
classes to prevent any document to be built.
So, something has gone wrong here!

Comparing the current code (from CVS) and the original code (2002), I think 
the problem may come from FragmentHandler which was imported from JDOMResult 
as a replacement for the original ElementBuilder.

The following statement was imported:
          // Add a dummy root element to the being-built document as XSL
          // transformation can output node lists instead of well-formed
          // documents.
          this.pushElement(new Element("root", null, null));

It makes sense for JDOMResult (the comment explains why) but not here.

I suspect the root element it adds allows SAXHandler to attach the build 
Elements hence causing the memory leak.

Could you remove these lines from FragmentHandler's constructor et verify this 
fixes the problem ?

Laurent

Brian Nahas a écrit :
> I have a 1.2 GB xml file I need to parse.  Since it's nicely 
> partitioned, I planned on using ElementScanner from the contrib package 
> to only load one item at a time.  Here's an equivalent schema:
> 
> <data>
>     <item>...</item>
>     <item>...</item>
>     <item>...</item>
>     ...
> </data>
> 
> The path for I'm using for my listener is "/data/item".
> 
> I assumed any previous items would be released by the parser upon 
> completion.  ElementScanner was very simple to set up to handle this, 
> however I ran into an OutOfMemory error on my first try.  I was a little 
> confused as I thought ElementScanner was specifically designed to 
> prevent this.  Upon investigation, I found that the SAXHandler used by 
> the ElementScanner was holding onto the previous items after I was done 
> with them.  It adds them to the default root element that 
> FragmentHandler creates and nothing removes them after the listeners are 
> called.  This seems to be in direct conflict with this message I found 
> which states that ElementScanner doesn't build a document (this message 
> is fairly old though):
> 
> http://www.servlets.com/archive/servlet/ReadMsg?msgId=350607&listName=jdom-interest 
> <http://www.servlets.com/archive/servlet/ReadMsg?msgId=350607&listName=jdom-interest>
> 
> I worked around this by explicitly detaching the element in my listener 
> when I was done with it, but since it seems like this would be a common 
> pattern and subtle trap, so I thought I'd ask and see if I was missing 
> some setting or improperly using ElementScanner.  There's a namespace 
> declared on the data element so I don't know if that has something to do 
> with it.
> 
> Thanks,
> -Brian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.jdom.org/pipermail/jdom-interest/attachments/20061115/e7ac6917/attachment.htm