[jdom-interest] question about running speed - patch

Alex Rosen arosen at silverstream.com
Thu Apr 11 16:51:40 PDT 2002


> I did a couple of tests last night. Since the common case is
> that SAXHandler only gets called once per text section, I
> tried optimizing for that. When the first text chunk comes
> in, it creates a String from the given char array. In the
> common case where that's the only text chunk before
> flushCharacters() occurs, then we've done the absolute
> minimal work necessary, so it can't get any faster or
> memory-efficient than that. If we then get a second text
> chunk, we switch over to using a StringBuffer instead. It
> turns out that this doesn't cost us too much. Plus, it's JDK
> 1.1 compatible. Does this sound like a reasonable strategy? I
> can't think of any drawbacks. I'll try a few more tests and
> then send in a patch sometime soon.

I tried some more tests using different buffering strategies. I tried 5 new
strategies, mostly with Xerces 1 and Crimson on JDK 1.2.2, but also some
with Xerces 2 and some with JDK 1.4. Basically, it doesn't much matter what
strategy you use, as long as you don't try to reuse the StringBuffer. (The
strategies included using a new StringBuffer each time, plus 4 variations on
the idea of putting the first chunk into a String, and the rest into either
a StringBuffer or an explicitly-managed char array.)

- The beta 8 version is fine, because it uses StringBuffer.substring(),
which creates a String that doesn't share the underlying char array. But,
it's not JDK 1.1 compatible.

- The current CVS version, which is 1.1 compatible, has horrible
performance. I'm not 100% sure why that is. At least part of it has to do
with the fact that it's creating a minimum 16-char array for each string.
That eats up memory quickly when a lot of the text sections consist of only
a newline, for example.

- All 5 of the schemes I used come out about the same, and are comparible in
speed to beta 8. The only noticable difference is that when you use a
StringBuffer, you can end up with extra, unused memory laying around
(because StringBuffer.toString() tries to use its internal buffer, which
might be bigger than necessary). Even with Crimson, this is only significant
in edge-case documents, where characters() gets called many times per text
section, but I guess it's worth fixing.

So I used a TextBuffer class that Brad sent me, which puts the first chunk
into a String, and then for subsequent chunks it uses its own char array
instead of using a StringBuffer. This should get us pretty optimal
performance, and remain 1.1 compatible. This class could be reused if we
have other builders in the future, e.g. XNIBuilder.

I also changed name of the internal subset StringBuffer from "buffer" to
"internalSubset", to avoid confusion.

Alex

PS I noticed that characters() checks inCDATA, but ignorableWhitespace()
doesn't. Is this correct? Should both characters() and ignorableWhitespace()
call the same internal method, to avoid duplicate code?
-------------- next part --------------
Index: SAXHandler.java
===================================================================
RCS file: /home/cvspublic/jdom/src/java/org/jdom/input/SAXHandler.java,v
retrieving revision 1.42
diff -c -r1.42 SAXHandler.java
*** SAXHandler.java	2002/04/12 04:19:04	1.42
--- SAXHandler.java	2002/04/12 07:24:10
***************
*** 134,143 ****
      protected LinkedList availableNamespaces;
  
      /** Temporary holder for the internal subset */
!     private StringBuffer buffer = new StringBuffer();
  
      /** Temporary holder for Text and CDATA */
!     private StringBuffer textBuffer = new StringBuffer();
  
      /** The external entities defined in this document */
      private Map externalEntities;
--- 134,143 ----
      protected LinkedList availableNamespaces;
  
      /** Temporary holder for the internal subset */
!     private StringBuffer internalSubset = new StringBuffer();
  
      /** Temporary holder for Text and CDATA */
!     private TextBuffer textBuffer = new TextBuffer();
  
      /** The external entities defined in this document */
      private Map externalEntities;
***************
*** 336,345 ****
  
          if (!inInternalSubset) return;
  
!         buffer.append("  <!ENTITY ")
                .append(name);
          appendExternalId(publicID, systemID);
!         buffer.append(">\n");
      }
  
      /**
--- 336,345 ----
  
          if (!inInternalSubset) return;
  
!         internalSubset.append("  <!ENTITY ")
                .append(name);
          appendExternalId(publicID, systemID);
!         internalSubset.append(">\n");
      }
  
      /**
***************
*** 359,365 ****
  
          if (!inInternalSubset) return;
  
!         buffer.append("  <!ATTLIST ")
                .append(eName)
                .append(" ")
                .append(aName)
--- 359,365 ----
  
          if (!inInternalSubset) return;
  
!         internalSubset.append("  <!ATTLIST ")
                .append(eName)
                .append(" ")
                .append(aName)
***************
*** 367,384 ****
                .append(type)
                .append(" ");
          if (valueDefault != null) {
!               buffer.append(valueDefault);
          } else {
!             buffer.append("\"")
                    .append(value)
                    .append("\"");
          }
          if ((valueDefault != null) && (valueDefault.equals("#FIXED"))) {
!             buffer.append(" \"")
                    .append(value)
                    .append("\"");
          }
!         buffer.append(">\n");
      }
  
      /**
--- 367,384 ----
                .append(type)
                .append(" ");
          if (valueDefault != null) {
!               internalSubset.append(valueDefault);
          } else {
!             internalSubset.append("\"")
                    .append(value)
                    .append("\"");
          }
          if ((valueDefault != null) && (valueDefault.equals("#FIXED"))) {
!             internalSubset.append(" \"")
                    .append(value)
                    .append("\"");
          }
!         internalSubset.append(">\n");
      }
  
      /**
***************
*** 393,399 ****
          // Skip elements that come from the external subset
          if (!inInternalSubset) return;
  
!         buffer.append("  <!ELEMENT ")
                .append(name)
                .append(" ")
                .append(model)
--- 393,399 ----
          // Skip elements that come from the external subset
          if (!inInternalSubset) return;
  
!         internalSubset.append("  <!ELEMENT ")
                .append(name)
                .append(" ")
                .append(model)
***************
*** 414,426 ****
          // Skip entities that come from the external subset
          if (!inInternalSubset) return;
  
!         buffer.append("  <!ENTITY ");
          if (name.startsWith("%")) {
!            buffer.append("% ").append(name.substring(1));
          } else {
!            buffer.append(name);
          }
!         buffer.append(" \"")
                .append(value)
                .append("\">\n");
      }
--- 414,426 ----
          // Skip entities that come from the external subset
          if (!inInternalSubset) return;
  
!         internalSubset.append("  <!ENTITY ");
          if (name.startsWith("%")) {
!            internalSubset.append("% ").append(name.substring(1));
          } else {
!            internalSubset.append(name);
          }
!         internalSubset.append(" \"")
                .append(value)
                .append("\">\n");
      }
***************
*** 678,688 ****
              flushCharacters();
          }
  
!         textBuffer.append( ch, start, length);
      }
  
      /**
       * <p>
       * This will flush any characters from SAX character calls we've
       * been buffering.
       * </p>
--- 678,709 ----
              flushCharacters();
          }
  
!         textBuffer.append(ch, start, length);
      }
  
      /**
       * <p>
+      * Capture ignorable whitespace as text.  If
+      * setIgnoringElementContentWhitespace(true) has been called then this
+      * method does nothing.
+      * </p>
+      *
+      * @param ch <code>[]</code> - char array of ignorable whitespace
+      * @param start <code>int</code> - starting position within array
+      * @param length <code>int</code> - length of whitespace after start
+      * @throws SAXException when things go wrong
+      */
+     public void ignorableWhitespace(char[] ch, int start, int length)
+                                                      throws SAXException {
+         if (suppress) return;
+         if (ignoringWhite) return;
+         if (length == 0) return;
+ 
+         textBuffer.append(ch, start, length);
+     }
+ 
+     /**
+      * <p>
       * This will flush any characters from SAX character calls we've
       * been buffering.
       * </p>
***************
*** 691,707 ****
       */
      protected void flushCharacters() throws SAXException {
  
!         if (textBuffer.length() == 0) {
              previousCDATA = inCDATA;
              return;
          }
  
-         /*
-          * Note: When we stop supporting JDK1.1, use substring instead
-         String data = textBuffer.substring(0);
-          */
          String data = textBuffer.toString();
!         textBuffer.setLength(0);
  
  /**
   * This is commented out because of some problems with
--- 712,724 ----
       */
      protected void flushCharacters() throws SAXException {
  
!         if (textBuffer.size() == 0) {
              previousCDATA = inCDATA;
              return;
          }
  
          String data = textBuffer.toString();
!         textBuffer.clear();
  
  /**
   * This is commented out because of some problems with
***************
*** 726,752 ****
  
      /**
       * <p>
-      * Capture ignorable whitespace as text.  If
-      * setIgnoringElementContentWhitespace(true) has been called then this
-      * method does nothing.
-      * </p>
-      *
-      * @param ch <code>[]</code> - char array of ignorable whitespace
-      * @param start <code>int</code> - starting position within array
-      * @param length <code>int</code> - length of whitespace after start
-      * @throws SAXException when things go wrong
-      */
-     public void ignorableWhitespace(char[] ch, int start, int length)
-                                                      throws SAXException {
-         if (suppress) return;
-         if (ignoringWhite) return;
-         if (length == 0) return;
- 
-         textBuffer.append( ch, start, length);
-     }
- 
-     /**
-      * <p>
       * Indicates the end of an element
       *   (<code>&lt;/[element name]&gt;</code>) is reached.  Note that
       *   the parser does not distinguish between empty
--- 743,748 ----
***************
*** 815,821 ****
       */
      public void endDTD() throws SAXException {
  
!         document.getDocType().setInternalSubset(buffer.toString());
          inDTD = false;
          inInternalSubset = false;
      }
--- 811,817 ----
       */
      public void endDTD() throws SAXException {
  
!         document.getDocType().setInternalSubset(internalSubset.toString());
          inDTD = false;
          inInternalSubset = false;
      }
***************
*** 925,931 ****
  
          String commentText = new String(ch, start, length);
          if (inDTD && inInternalSubset && (expand == false)) {
!             buffer.append("  <!--")
                    .append(commentText)
                    .append("-->\n");
              return;
--- 921,927 ----
  
          String commentText = new String(ch, start, length);
          if (inDTD && inInternalSubset && (expand == false)) {
!             internalSubset.append("  <!--")
                    .append(commentText)
                    .append("-->\n");
              return;
***************
*** 955,964 ****
  
          if (!inInternalSubset) return;
  
!         buffer.append("  <!NOTATION ")
                .append(name);
          appendExternalId(publicID, systemID);    
!         buffer.append(">\n");
      }
  
      /**
--- 951,960 ----
  
          if (!inInternalSubset) return;
  
!         internalSubset.append("  <!NOTATION ")
                .append(name);
          appendExternalId(publicID, systemID);    
!         internalSubset.append(">\n");
      }
  
      /**
***************
*** 977,988 ****
  
          if (!inInternalSubset) return;
  
!         buffer.append("  <!ENTITY ")
                .append(name);
          appendExternalId(publicID, systemID);
!         buffer.append(" NDATA ")
                .append(notationName);
!         buffer.append(">\n");
      }
  
      /**
--- 973,984 ----
  
          if (!inInternalSubset) return;
  
!         internalSubset.append("  <!ENTITY ")
                .append(name);
          appendExternalId(publicID, systemID);
!         internalSubset.append(" NDATA ")
                .append(notationName);
!         internalSubset.append(">\n");
      }
  
      /**
***************
*** 996,1013 ****
       */
      protected void appendExternalId(String publicID, String systemID) {
          if (publicID != null) {
!             buffer.append(" PUBLIC \"")
                    .append(publicID)
                    .append("\"");
          }
          if (systemID != null) {
              if (publicID == null) {
!                 buffer.append(" SYSTEM ");
              }
              else {
!                 buffer.append(" ");
              }
!             buffer.append("\"")
                    .append(systemID)
                    .append("\"");
          }
--- 992,1009 ----
       */
      protected void appendExternalId(String publicID, String systemID) {
          if (publicID != null) {
!             internalSubset.append(" PUBLIC \"")
                    .append(publicID)
                    .append("\"");
          }
          if (systemID != null) {
              if (publicID == null) {
!                 internalSubset.append(" SYSTEM ");
              }
              else {
!                 internalSubset.append(" ");
              }
!             internalSubset.append("\"")
                    .append(systemID)
                    .append("\"");
          }
-------------- next part --------------
/*--

 $Id: TextBuffer.java,v 1.1 2002/03/15 05:36:48 jhunter Exp $

 Copyright (C) 2000 Brett McLaughlin & Jason Hunter.
 All rights reserved.

 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions
 are met:

 1. Redistributions of source code must retain the above copyright
    notice, this list of conditions, and the following disclaimer.

 2. Redistributions in binary form must reproduce the above copyright
    notice, this list of conditions, and the disclaimer that follows
    these conditions in the documentation and/or other materials
    provided with the distribution.

 3. The name "JDOM" must not be used to endorse or promote products
    derived from this software without prior written permission.  For
    written permission, please contact license at jdom.org.

 4. Products derived from this software may not be called "JDOM", nor
    may "JDOM" appear in their name, without prior written permission
    from the JDOM Project Management (pm at jdom.org).

 In addition, we request (but do not require) that you include in the
 end-user documentation provided with the redistribution and/or in the
 software itself an acknowledgement equivalent to the following:
     "This product includes software developed by the
      JDOM Project (http://www.jdom.org/)."
 Alternatively, the acknowledgment may be graphical using the logos
 available at http://www.jdom.org/images/logos.

 THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
 WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
 OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 DISCLAIMED.  IN NO EVENT SHALL THE JDOM AUTHORS OR THE PROJECT
 CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
 USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
 ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
 OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 SUCH DAMAGE.

 This software consists of voluntary contributions made by many
 individuals on behalf of the JDOM Project and was originally
 created by Brett McLaughlin <brett at jdom.org> and
 Jason Hunter <jhunter at jdom.org>.  For more information on the
 JDOM Project, please see <http://www.jdom.org/>.

 */

package org.jdom.input;

/**
 * <p>
 * <code>TextBuffer</code> is similar to StringBuffer, but optimized
 * for XML parsing, where the common case is that you get only one chunk
 * of characters per text section. <code>TextBuffer</code> stores the
 * first chunk of characters in a String, which can just be returned
 * directly if no second chunk is received. Subsequent chunks are stored
 * in a supplemental char array (like StringBuffer uses). In this case,
 * the returned text will be the first String chunk, concatenated with
 * the subsequent chunks stored in the char array. This provides optimal
 * performance in the common case, while still providing very good 
 * performance in the uncommon case. Furthermore, avoiding StringBuffer
 * means that no extra unused char array space will be kept around
 * after parsing is through. 
 * </p>
 *
 * @author Bradley S. Huffman
 * @author Alex Rosen
 * @version $Revision: 1.1 $, $Date: 2002/03/15 05:36:48 $
 */
public class TextBuffer {
    /** The first part of the text value (the "prefix"). If null, the text value is the empty string. */
    private String prefixString;
    /** The rest of the text value (the "suffix"). Only the first <code>arraySize</code> characters are valid. */
    private char[] array;
    /** The size of the rest of the text value. If zero, then only <code>prefixString</code>
    contains the text value. */
    private int arraySize;

    /** Constructor */
    public TextBuffer() {
		array = new char[4096]; // initial capacity
        arraySize = 0;
    }

    /** Append the specified text to the text value of this buffer. */
    public void append(char[] source, int start, int count) {
        if (prefixString == null) {
            // This is the first chunk, so we'll store it in the prefix string.
            prefixString = new String(source, start, count);
        }
        else {
            // This is a subsequent chunk, so we'll add it to the char array.
            ensureCapacity(arraySize + count);
            System.arraycopy(source, start, array, arraySize, count);
            arraySize += count;
        }
    }

    /** Returns the size of the text value. */
    public int size() {
        if (prefixString == null)
            return 0;
        else
            return prefixString.length() + arraySize;
    }

    /** Clears the text value and prepares the TextBuffer for reuse. */
    public void clear() {
        arraySize = 0;
        prefixString = null;
    }

    /** Returns the text value stored in the buffer. */
    public String toString() {
        if (prefixString == null) {
            return "";
        }

        String str = "";
        if (arraySize == 0) {
            // Char array is empty, so the text value is just prefixString.
            str = prefixString;
        }
        else {
            // Char array is not empty, so the text value is prefixString plus the char array.
            str = new StringBuffer(prefixString.length() + arraySize)
                    .append(prefixString)
                    .append(array, 0, arraySize)
                    .toString();
        }
        return str;
    }

    // Ensure that the char array has room for at least "csize" characters.
    private void ensureCapacity(int csize) {
        int capacity = array.length;
        if (csize > capacity) {
            char[] old = array;
            int nsize = capacity;
            while (csize > nsize) {
                nsize += (capacity/2);
            }
            array = new char[nsize];
            System.arraycopy(old, 0, array, 0, arraySize);
        }
    }
}


More information about the jdom-interest mailing list