[jdom-interest] how to parse UTF-8 encoded files with JDOM b7 and Xerces

Aron Kramlik aron.kramlik at itouch.com.au
Mon Dec 2 19:45:14 PST 2002


Thanks Jason and Alexey.  It was my fault, the parsing was working fine
as you both said but the XMLOutputter's encoding attribute was not being
explicitly set to UTF-8 and so it was defaulting I guess to the
file.encoding
system property.  So now it works and thank you both.

Cheers,
Aron Kramlik.

----- Original Message -----
From: "Jason Hunter" <jhunter at servlets.com>
To: "Aron Kramlik" <aron.kramlik at itouch.com.au>
Cc: <jdom-interest at jdom.org>
Sent: Tuesday, December 03, 2002 4:09 AM
Subject: Re: [jdom-interest] how to parse UTF-8 encoded files with JDOM b7
and Xerces


> > I have an XML file that is encoded in UTF-8 and it has a mixture of
English
> > and Cantonese characters.  When I parse the file using SAXParser I don't
> > get the real Cantonese characters back but rather just questions marks
(?).
> >
> > If I set the file.encoding option of the JVM to be UTF-8 it works fine.
>
> Does your XML declaration include: encoding="UTF-8"?
>
> I believe UTF-8 is assumed if the encoding is missing from the decl, but
> the fact that changing file.encoding fixes the problem makes me think
> perhaps your parser isn't following that rule and is instead using the
> JVM's default charset.
>
> -jh-
> _______________________________________________
> To control your jdom-interest membership:
>
http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhos
t.com
>
>
>





More information about the jdom-interest mailing list