[jdom-interest] how to parse UTF-8 encoded files with JDOM b7 and Xerces

Jason Hunter jhunter at servlets.com
Mon Dec 2 09:09:43 PST 2002


> I have an XML file that is encoded in UTF-8 and it has a mixture of English
> and Cantonese characters.  When I parse the file using SAXParser I don't
> get the real Cantonese characters back but rather just questions marks (?).
> 
> If I set the file.encoding option of the JVM to be UTF-8 it works fine.

Does your XML declaration include: encoding="UTF-8"?

I believe UTF-8 is assumed if the encoding is missing from the decl, but
the fact that changing file.encoding fixes the problem makes me think
perhaps your parser isn't following that rule and is instead using the
JVM's default charset.

-jh-



More information about the jdom-interest mailing list