[jdom-interest] SAX Parser/DTD Validation EXTREMELY slow

Dan Temple Dan.Temple at wcom.com
Wed Oct 16 15:57:13 PDT 2002

Thanks Frank, that did it. All I had to do was put the latest xercesImpl.jar in my classpath. It parsed in 1282 milliseconds instead of the estimated 26+ hours. Nasty!

I appreciate it.


  ----- Original Message ----- 
  From: Frank Sauer 
  To: Dan Temple ; jdom-interest at jdom.org 
  Sent: Wednesday, October 16, 2002 3:23 PM
  Subject: RE: [jdom-interest] SAX Parser/DTD Validation EXTREMELY slow

  Have you tried another XML parser than the one packaged in the 1.4 JRE?

  Here's something I plucked off the web somewhere on how to do that (replace Xalan with Xerces, same applies):

  Annoyingly, the Xalan-J classes included in Java 1.4 are zipped into the rt.jar archive so it's hard to replace them with a less-buggy release version of Xalan. It can be done, but you have to put the xalan.jar file in your $JAVA_HOME/lib/endorsed directory rather than in the normal jre/lib/ext directory. The exact location of $JAVA_HOME varies from system to system, but it's probably something like C:\j2sdk1.4.0 on Windows. None of this is an issue with Java 1.3 and earlier, which don't bundle these classes. On these systems you just need to install whatever jar files your XSLT engine vendor provides in the usual locations, the same as you would any other third party library. 


    -----Original Message-----
    From: Dan Temple [mailto:Dan.Temple at wcom.com]
    Sent: Wednesday, October 16, 2002 5:14 PM
    To: jdom-interest at jdom.org
    Subject: [jdom-interest] SAX Parser/DTD Validation EXTREMELY slow

    I have a DTD ELEMENT sequence that has 47 children. When I turn on SAX validation (SAXBuilder(true)) it takes forever to build (saxBuilder.build(File)). OK, not actually forever, but 26 hours! It takes so long that at first I thought that it was in an infinite loop - CPU was at 100%.

    By playing with the XML & DTD, I was able to narrow down the problem to just the number of entries in the DTD ELEMENT sequence. It's falling down at about 20 entries and goes up exponentially from there. For every additional child entry, the time it takes to validate DOUBLES (almost exactly).

    Entries Millis
    ======= ======
       1       711
       5       641
      10       621
      15       631
      20       811
      21      1031
      22      1422
      23      2163
      24      3685
      25      6709
      26     12769
      27     24796
      28     48830
      29    102367
      30    206056
      31    405963

    My largest actual test was 35 entries & sure enough - over 1 1/2 hours. By my calculations, to process my 47 entries will take over 26 hours. The only solution for me is to turn off SAX parser validation.

    I am using the latest code from CVS - Beta8 (tag: jdom_1_0_b8). I am using Java 1.4.0_02. The problem occurs on both Windows 2000 & Solaris 8.

    Attached is XMLTest.java, test.xml, & test.dtd. Just compile & run the XMLTest class, it's hard coded to use the test.xml file which references the test.dtd file.


    Dan Temple

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://jdom.org/pipermail/jdom-interest/attachments/20021016/b6beef76/attachment.htm

More information about the jdom-interest mailing list