[jdom-interest] Kana symbols and UTF-8? (was Re: Kana characters?)
grzegorz.kaczor at gmail.com
Tue May 22 13:52:29 PDT 2007
> I believe it's legal (but somebody might shoot me down on this) for the> prolog to have an encoding of 'us-ascii' (ie single byte characters) and> then to use the XML character escapes (&#xnnnn) to represent the extended> character set.
Yes, it is legal. These are two separate worlds, the encodingspecified in the XML declaration and the information contained incharacter references. According to the XML specification, characterreferences can point to any reasonable Unicode character:
Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |[#x10000-#x10FFFF] /* any Unicode character, excluding the surrogateblocks, FFFE, and FFFF. */
_______________________________> From: jdom-interest-bounces at jdom.org [mailto:jdom-interest-bounces at jdom.org]> On Behalf Of Alan Deikman> Sent: Tuesday, May 22, 2007 9:44 AM> To: jdom-interest at jdom.org> Subject: [jdom-interest] Kana symbols and UTF-8? (was Re: Kana characters?)>>> OK, now I'm a little confused. I guess this is an XML question and not> really a JDOM question, but perhaps someone can explain it.>> Angela Amoateng wrote:>> This is the code in my XML document (by the way, romaji is romanised> Japanese):>> <?xml version="1.0" encoding="UTF-8"?>>> <dictionary>> <word>> <noun>> <english>book</english>> <romaji>hon</romaji>> <hiraganaSym>ほん</hiraganaSym>> <hiraganaNum>ほん</hiraganaNum>> </noun>>> Where I get lost is in the <hiriganaSym> tag. Those characters inside are> not part of any 8-bit code (ASCII, UTF-8 or whatever). Java has no problem> with it because all String objects are built on unicode, but what does the> encoding="UTF-8" mean in the header if these symbols can show up in the> document?>> --Alan DeikmanZNYX Networks>> -->>>> This message and any attachments are confidential, proprietary, and may be> privileged. If this message was misdirected, Barclays Global Investors> (BGI) does not waive any confidentiality or privilege. If you are not the> intended recipient, please notify us immediately and destroy the message> without disclosing its contents to anyone. Any distribution, use or copying> of this e-mail or the information it contains by other than an intended> recipient is unauthorized. The views and opinions expressed in this e-mail> message are the author's own and may not reflect the views and opinions of> BGI, unless the author is authorized by BGI to express such views or> opinions on its behalf. All email sent to or from this address is subject> to electronic storage and review by BGI. Although BGI operates anti-virus> programs, it does not accept responsibility for any damage whatsoever caused> by viruses being passed.> _______________________________________________> To control your jdom-interest membership:> http://email@example.com>
-- "Choć tyle wiemy własnym doświadczeniem:W nas jest Raj, Piekło - i do obu - szlaki."J.K.
More information about the jdom-interest