[jdom-interest] how to parse UTF-8 encoded files with JDOM b7 and Xerces

Alexey Solofnenko A.Solofnenko at mdl.com
Mon Dec 2 10:00:48 PST 2002


Please do not read the file yourself. Just provide the file path into parser
and the parser will deal with its encoding. At least do not create
FileReader, but FileInputStream, since it will not interpret characters.

- Alexey.

--
{ http://trelony.cjb.net/   } Alexey N. Solofnenko
Pleasant Hill, CA (GMT-8 usually)

-----Original Message-----
From: Aron Kramlik [mailto:aron.kramlik at itouch.com.au] 
Sent: Monday, December 02, 2002 5:11 AM
To: jdom-interest at jdom.org
Subject: [jdom-interest] how to parse UTF-8 encoded files with JDOM b7 and
Xerces

Hi,

I have an XML file that is encoded in UTF-8 and it has a mixture of English
and Cantonese characters.  When I parse the file using SAXParser I don't
get the real Cantonese characters back but rather just questions marks (?).

If I set the file.encoding option of the JVM to be UTF-8 it works fine.

Now, my question is, how can I control the file encoding per parse within a
JVM if it would not be suitable for me to change this global setting from
the
default one?

Thanks for your time and help,

Aron Kramlik.



_______________________________________________
To control your jdom-interest membership:
http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhos
t.com



More information about the jdom-interest mailing list