[jdom-interest] Kana symbols and UTF-8? (was Re: Kana characters?)

Alan Deikman Alan.Deikman at znyx.com
Tue May 22 09:43:55 PDT 2007


OK, now I'm a little confused.   I guess this is an XML question and not 
really a JDOM question, but perhaps someone can explain it.

Angela Amoateng wrote:
>
> This is the code in my XML document (by the way, romaji is romanised 
> Japanese):
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <dictionary>
>    <word>
>        <noun>
>            <english>book</english>
>            <romaji>hon</romaji>
>            <hiraganaSym>ほん</hiraganaSym>
>            <hiraganaNum>&#x307B;&#x3093;</hiraganaNum>
>        </noun>

Where I get lost is in the <hiriganaSym> tag.   Those characters inside 
are not part of any 8-bit code (ASCII, UTF-8 or whatever).  Java has no 
problem with it because all String objects are built on unicode, but 
what does the _encoding="UTF-8"_ mean in the header if these symbols can 
show up in the document?

-- 
Alan Deikman
ZNYX Networks

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.jdom.org/pipermail/jdom-interest/attachments/20070522/20d4b3a8/attachment.htm


More information about the jdom-interest mailing list