[jdom-interest] XML Element name Verifier is overly strict and
doesn't match current XML 1.0 REC
Leigh.Klotz at xerox.com
Thu Mar 19 14:50:34 PDT 2009
JDOM 1.1 won't create elements whose characters are in the following
Unicode 0xFF41-0xFF5A (FULLWIDTH LATIN SMALL LETTER A to FULLWIDTH
LATIN SMALL LETTER Z)
Unicode 0xFF21-0xFF3A (FULLWIDTH LATIN CAPITAL LETTER A to FULLWIDTH
LATIN CAPITAL LETTER Z)
The JDOM 1.1 source for org.jdom.Verifier.isXMLLetter cites production
84 of the XML 1.0 Recommendation for its table of allowed characters.
However, according to http://www.w3.org/TR/REC-xml/ the whole of
Appendix B (which contains Production 84) is obsolete and is not used
within the recommendation. The XML Rec instead uses production  for
NameStartChar and  for NameChar.
The productions at  and  are considerably smaller than those of
Appendix B, and are more inclusive, providing for greater utility in
I18N applications of XML.
Furthermore, according to http://www.w3.org/TR/REC-xml/ Appendix J
(Non-Normative), the characters I menition above are not only allowed,
but encouraged for use in XML Names, because the Unicode ID_Start
property and ID_Continue of these Unicode code points is True.
The XML REC says:
1. The first character of any name should have a Unicode property of
ID_Start, or else be '_' #x5F.
2. Characters other than the first should have a Unicode property of
ID_Continue, or ...
You can see that ID_Start and ID_Continue are True on the individual
pages for the small letters here:
I recommend that org.jdom.Verifier.isXMLLetter be updated to use
production , [4a], and  of XML 1.0 Fifth Edition.
It's quite likely that some of the other character class verifiers need
updating as well, but I didn't examine them.
More information about the jdom-interest