[jdom-interest] CDATA inconsistency

Malachi de AElfweald malachi at tremerechantry.com
Sat Nov 2 11:32:12 PST 2002


"unmatched halves of surrogate pairs".... That would be assuming UTF-8 specifically,
would it not? ISO-8859-1, for example, does not have surrogate pairs.

Malachi


11/2/2002 8:22:01 AM, Elliotte Rusty Harold <elharo at metalab.unc.edu> wrote:

>At 11:08 PM -0800 11/1/02, Malachi de AElfweald wrote:
>>It would be against XML spec to check the characters within the 
>>CDATA, since the spec
>>says that CDATA is "unparsed character data". Seems like parsing it 
>>wouldn't fit the description, eh?
>>
>
>No, that's not quite true. there are a number of characters which 
>cannot appear in a CDATA section. These include many C0 controls such 
>as null and vertical tab, unmatched halves of surrogate pairs, and a 
>few other undefined code points. The three character sequence ]]> is 
>also illegal.
>-- 
>
>+-----------------------+------------------------+-------------------+
>| Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
>+-----------------------+------------------------+-------------------+
>|          XML in a  Nutshell, 2nd Edition (O'Reilly, 2002)          |
>|              http://www.cafeconleche.org/books/xian2/              |
>|  http://www.amazon.com/exec/obidos/ISBN%3D0596002920/cafeaulaitA/  |
>+----------------------------------+---------------------------------+
>|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
>|  Read Cafe con Leche for XML News: http://www.cafeconleche.org/    |
>+----------------------------------+---------------------------------+
>
>






More information about the jdom-interest mailing list