[jdom-interest] XML Element name Verifier is overly strict and doesn't match current XML 1.0 REC

Jason Hunter jhunter at servlets.com
Sat Mar 21 19:55:02 PDT 2009


Note that this had a pretty good debate on xml-dev (while our list was  
down):

http://markmail.org/message/wqcmohlf7srpqhkl

General consensus seems to be the current behavior is the lesser of  
two evils.

-jh-

On Mar 19, 2009, at 2:50 PM, Klotz, Leigh wrote:

> JDOM 1.1 won't create elements whose characters are in the following
> ranges:
>  Unicode 0xFF41-0xFF5A (FULLWIDTH LATIN SMALL LETTER A to FULLWIDTH
> LATIN SMALL LETTER Z)
>  Unicode 0xFF21-0xFF3A (FULLWIDTH LATIN CAPITAL LETTER A to FULLWIDTH
> LATIN CAPITAL LETTER Z)
>
> The JDOM 1.1 source for org.jdom.Verifier.isXMLLetter cites production
> 84 of the XML 1.0 Recommendation for its table of allowed characters.
>
> However, according to http://www.w3.org/TR/REC-xml/ the whole of
> Appendix B (which contains Production 84) is obsolete and is not used
> within the recommendation.  The XML Rec instead uses production [4]  
> for
> NameStartChar and [5] for NameChar.
>
> The productions at [4] and [5] are considerably smaller than those of
> Appendix B, and are more inclusive, providing for greater utility in
> I18N applications of XML.
>
> Furthermore, according to http://www.w3.org/TR/REC-xml/ Appendix J
> (Non-Normative), the characters I menition above are not only allowed,
> but encouraged for use in XML Names, because the Unicode ID_Start
> property and ID_Continue of these Unicode code points is True.
>
> The XML REC says:
>
>    1. The first character of any name should have a Unicode property  
> of
> ID_Start, or else be '_' #x5F.
>    2. Characters other than the first should have a Unicode property  
> of
> ID_Continue, or ...
>
> You can see that ID_Start and ID_Continue are True on the individual
> pages for the small letters here:
> http://unicode.org/cldr/utility/character.jsp?a=FF41
> to
> http://unicode.org/cldr/utility/character.jsp?a=FF5A
>
> I recommend that org.jdom.Verifier.isXMLLetter be updated to use
> production [4], [4a], and [5] of XML 1.0 Fifth Edition.
> It's quite likely that some of the other character class verifiers  
> need
> updating as well, but I didn't examine them.
>
> Leigh.
>
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/ 
> youraddr at yourhost.com



More information about the jdom-interest mailing list