[jdom-interest] RE: A suggested performance improvement

Tom Oke tomo at elluminate.com
Mon Mar 17 20:01:05 PST 2003


In my previous mailing, I indicated some significant
performance improvements possible through short-circuiting
the most-used character checking path in checkCharacterData,
so that it did not require a call to isXMLCharacter.

As significant as those improvements were, they were
illusionary and an artifact of the analysis environment,
not later proved out with simple metering code using
System.currentMillisecs.

The HotSpot compiler seems to do a pretty good job
of doing the short-circuiting itself, given it was
working on a couple of megabytes of character data.

Re-running the analysis program (Optimizit 4.01, not
5 which I believe will work with HotSpot) I still get
the numbers, but reality indicates they are from the
disabling of HotSpot.

Sorry for the premature eureka, I may not be ALL wet,
but I am certainly feeling rather damp.

Tom

>>> Tom Oke <tomo at elluminate.com> 3/16/2003 8:18:44 PM >>>
I have noticed, on large XML files, that the majority of the CPU time
is going into the routines: Verifier.isXMLCharacter and 
Verifier.checkCharacterData.

I had initially modified isXMLCharacter to have it check the most
likely range of data first, to get a short exit, and this took off
about 25% of the CPU used in some large files, for the JDOM read.

However, in the thread doing the JDOM input, 62% of the time
was still in isXMLCharacter and 16% was in checkCharacterData,
which calls isXMLCharacter.

The biggest bang for the buck was by enclosing the 
if statement with isXMLCharacter with a test for the 
most likely good range. This is seen below in the two
lines:

            char c = text.charAt(i);
            if (!(c > 0x1F && c < 0xD800)) {

This reduced checkCharacterData to 1.32% of the thread use,
and isXMLCharacter doesn't really show up at all.

Hopefully this is a reasonable change to submit to JDOM?

What follows is the full code for Verifier.checkCharacterData.



    public static final String checkCharacterData(String text) {
        if (text == null) {
            return "A null is not a legal XML value";
        }

        // do check
        for (int i = 0, len = text.length(); i<len; i++) {
            char c = text.charAt(i);
            if (!(c > 0x1F && c < 0xD800)) {
                if (!isXMLCharacter(text.charAt(i))) {
                    // Likely this character can't be easily displayed
                    // because it's a control so we use it'd hexadecimal
                    // representation in the reason.
                    return ("0x" + Integer.toHexString(text.charAt(i))
                            + " is not a legal XML character");
                }
            }
        }

        // If we got here, everything is OK
        return null;
    }

Tom Oke



More information about the jdom-interest mailing list