[jdom-interest] XML Element name Verifier is overly strict and doesn't match current XML 1.0 REC

Klotz, Leigh Leigh.Klotz at xerox.com
Thu Mar 26 10:30:15 PDT 2009


I agree.
Thank you both for researching the issue and for getting the list back
up.
Leigh. 

-----Original Message-----
From: Jason Hunter [mailto:jhunter at servlets.com] 
Sent: Saturday, March 21, 2009 7:55 PM
To: jdom interest
Cc: Klotz, Leigh
Subject: Re: [jdom-interest] XML Element name Verifier is overly strict
and doesn't match current XML 1.0 REC

Note that this had a pretty good debate on xml-dev (while our list was
down):

http://markmail.org/message/wqcmohlf7srpqhkl

General consensus seems to be the current behavior is the lesser of two
evils.

-jh-

On Mar 19, 2009, at 2:50 PM, Klotz, Leigh wrote:

> JDOM 1.1 won't create elements whose characters are in the following
> ranges:
>  Unicode 0xFF41-0xFF5A (FULLWIDTH LATIN SMALL LETTER A to FULLWIDTH 
> LATIN SMALL LETTER Z)  Unicode 0xFF21-0xFF3A (FULLWIDTH LATIN CAPITAL 
> LETTER A to FULLWIDTH LATIN CAPITAL LETTER Z)
>
> The JDOM 1.1 source for org.jdom.Verifier.isXMLLetter cites production
> 84 of the XML 1.0 Recommendation for its table of allowed characters.
>
> However, according to http://www.w3.org/TR/REC-xml/ the whole of 
> Appendix B (which contains Production 84) is obsolete and is not used 
> within the recommendation.  The XML Rec instead uses production [4] 
> for NameStartChar and [5] for NameChar.
>
> The productions at [4] and [5] are considerably smaller than those of 
> Appendix B, and are more inclusive, providing for greater utility in 
> I18N applications of XML.
>
> Furthermore, according to http://www.w3.org/TR/REC-xml/ Appendix J 
> (Non-Normative), the characters I menition above are not only allowed,

> but encouraged for use in XML Names, because the Unicode ID_Start 
> property and ID_Continue of these Unicode code points is True.
>
> The XML REC says:
>
>    1. The first character of any name should have a Unicode property 
> of ID_Start, or else be '_' #x5F.
>    2. Characters other than the first should have a Unicode property 
> of ID_Continue, or ...
>
> You can see that ID_Start and ID_Continue are True on the individual 
> pages for the small letters here:
> http://unicode.org/cldr/utility/character.jsp?a=FF41
> to
> http://unicode.org/cldr/utility/character.jsp?a=FF5A
>
> I recommend that org.jdom.Verifier.isXMLLetter be updated to use 
> production [4], [4a], and [5] of XML 1.0 Fifth Edition.
> It's quite likely that some of the other character class verifiers 
> need updating as well, but I didn't examine them.
>
> Leigh.
>
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/
> youraddr at yourhost.com




More information about the jdom-interest mailing list