[jdom-interest] Not legal JDOM characters?

Elliotte Rusty Harold elharo at metalab.unc.edu
Mon Aug 23 12:48:38 PDT 2004


At 11:18 AM -0700 8/23/04, Dave Byrne wrote:

>The data "?" is not legal for a JDOM character content: 0xd835 is not a
>legal XML character.

Notice that this error message does not reference the character 
you're including. Furthermore, 0xd835 is a Unicode high surrogate. I 
therefore surmise that this is a bug in XOM, where XOM is not 
decoding surrogate pairs before passing them to the Verifier. Looking 
at the source, my surmise is correct:

         for (int i = 0, len = text.length(); i<len; i++) {
             if (!isXMLCharacter(text.charAt(i))) {
                 // Likely this character can't be easily displayed
                 // because it's a control so we use it'd hexadecimal
                 // representation in the reason.
                 return ("0x" + Integer.toHexString(text.charAt(i))
                  + " is not a legal XML character");
             }
         }

JDOM should be recognizing that this character is half of a surrogate 
pair, decoding the surrogate pair, and checking that. That's it's 
failing to do so is a bug.

-- 

   Elliotte Rusty Harold
   elharo at metalab.unc.edu
   Effective XML (Addison-Wesley, 2003)
   http://www.cafeconleche.org/books/effectivexml
   http://www.amazon.com/exec/obidos/ISBN%3D0321150406/ref%3Dnosim/cafeaulaitA


More information about the jdom-interest mailing list