[jdom-interest] how to parse UTF-8 encoded files with JDOM b7 and Xerces
aron.kramlik at itouch.com.au
Mon Dec 2 19:45:14 PST 2002
Thanks Jason and Alexey. It was my fault, the parsing was working fine
as you both said but the XMLOutputter's encoding attribute was not being
explicitly set to UTF-8 and so it was defaulting I guess to the
system property. So now it works and thank you both.
----- Original Message -----
From: "Jason Hunter" <jhunter at servlets.com>
To: "Aron Kramlik" <aron.kramlik at itouch.com.au>
Cc: <jdom-interest at jdom.org>
Sent: Tuesday, December 03, 2002 4:09 AM
Subject: Re: [jdom-interest] how to parse UTF-8 encoded files with JDOM b7
> > I have an XML file that is encoded in UTF-8 and it has a mixture of
> > and Cantonese characters. When I parse the file using SAXParser I don't
> > get the real Cantonese characters back but rather just questions marks
> > If I set the file.encoding option of the JVM to be UTF-8 it works fine.
> Does your XML declaration include: encoding="UTF-8"?
> I believe UTF-8 is assumed if the encoding is missing from the decl, but
> the fact that changing file.encoding fixes the problem makes me think
> perhaps your parser isn't following that rule and is instead using the
> JVM's default charset.
> To control your jdom-interest membership:
More information about the jdom-interest