[jdom-interest] Text class

Elliotte Rusty Harold elharo at metalab.unc.edu
Sun May 27 08:12:48 PDT 2001

At 7:48 PM -0500 5/26/01, Brett McLaughlin wrote:

>The only reason that I didn't do that was to ensure that we aren't
>"hard-wired" to any character based format. This was something Elliotte
>pointed out, that made sense to me. Like even if String wasn't final, I
>wouldn't extend it for the sake of Unicode and so forth...
>Elliotte, any thoughts here?

I agree. We don't want to hardwire it if we don't have to.

I've been rethinking my initial objections to Sun's approach to 
handling non-BMP  characters in Strings. I need to look more closely 
at the just released JDK 1.4 to see what they're up to, but I'm 
thinking maybe surrogate pairs will work. However, our logic will 
have to decode the surrogate pairs before processing. In particular, 
this affects the Verifier class. For name characters we're OK because 
those can't use non-BMP characters. However, verifying the text 
content of an element or attribute may require the ability to ask for 
the next character  inside the isCharacter() method. Or we may need 
to rethink the API completely so it only verifies whole strings, not 
individual characters.

What I've come to realize is that we may be OK with strings and 
string buffers if we no longer assume one Java char equals one 
Unicode character. I need to do some more research and 

On the other hand, it's very important that we do support all these 
non-BMP characters. The latest discovery here is that the new Han 
ideographs include some essential characters, including, for example, 
the ideogram for "I" (1st person singular pronoun) used in one 
dialect of Chinese spoken by more than 30 million people.

| Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
|                  The XML Bible (IDG Books, 1999)                   |
|              http://metalab.unc.edu/xml/books/bible/               |
|   http://www.amazon.com/exec/obidos/ISBN=0764532367/cafeaulaitA/   |
|  Read Cafe au Lait for Java News:  http://metalab.unc.edu/javafaq/ |
|  Read Cafe con Leche for XML News: http://metalab.unc.edu/xml/     |

More information about the jdom-interest mailing list