[jdom-interest] Not legal JDOM characters?

Dave Byrne dave-lists at intelligentendeavors.com
Mon Aug 23 14:54:34 PDT 2004

In Verifier.isXMLCharacter() I added:

        if (c >= 0xD800 && c <= 0xDBFF) return true;
        if (c >= 0xDC00 && c <= 0xDFFF) return true;

which seems to work for me but definitely not the correct way to go about

I looked into the surrogate pair decoding that you brought up, and from what
I can tell from the J2SE docs using String.charAt() will always split
surrogate pairs in half.

I'm not familiar with the low-level handling of UTF-16 in java, but is there
a way to examine strings char by char without splitting the surrogate pairs
in half?  It seems that it may offer a better long-term way of handling
these chars.

Dave Byrne

-----Original Message-----
From: Elliotte Rusty Harold [mailto:elharo at metalab.unc.edu] 
Sent: Monday, August 23, 2004 12:49 PM
To: Dave Byrne
Cc: jdom-interest at jdom.org
Subject: Re: [jdom-interest] Not legal JDOM characters?

At 11:18 AM -0700 8/23/04, Dave Byrne wrote:

>The data "?" is not legal for a JDOM character content: 0xd835 is not a
>legal XML character.

Notice that this error message does not reference the character 
you're including. Furthermore, 0xd835 is a Unicode high surrogate. I 
therefore surmise that this is a bug in XOM, where XOM is not 
decoding surrogate pairs before passing them to the Verifier. Looking 
at the source, my surmise is correct:

         for (int i = 0, len = text.length(); i<len; i++) {
             if (!isXMLCharacter(text.charAt(i))) {
                 // Likely this character can't be easily displayed
                 // because it's a control so we use it'd hexadecimal
                 // representation in the reason.
                 return ("0x" + Integer.toHexString(text.charAt(i))
                  + " is not a legal XML character");

JDOM should be recognizing that this character is half of a surrogate 
pair, decoding the surrogate pair, and checking that. That's it's 
failing to do so is a bug.


   Elliotte Rusty Harold
   elharo at metalab.unc.edu
   Effective XML (Addison-Wesley, 2003)

More information about the jdom-interest mailing list