[jdom-interest] UTF8 charset issues...
arosen at novell.com
Fri Oct 10 09:34:24 PDT 2003
"just calling Element.setText("Æ") does not generate a correct UTF-8 encoded document."
How did you determine this? I.e. what tool did you use to look at the document? What I'm getting at is, I think that the document was right, but the tool you used to look at it made it look "wrong". Realize that the *bytes* of the UTF-8 encoding of Æ are going to look like garbage characters. If you view the file using a tool that uses any encoding other than UTF-8, it'll look mangled, even though it's not. The viewer you used (e.g. maybe Notepad or another text editor) probably read it using your machine's default encoding (such as Latin 1), so it looked garbled even though it was really OK (i.e. if your viewer used UTF-8 to show it to you, it would be fine.)
Encoding issues are really confusing, unfortunately.
>>> Patrick JUSSEAU <patrick at openbase.com> 10/10/2003 8:35:20 AM >>>
I am trying to understand how jdom handles character encodings. Here is
what I am doing:
I have a java app which reads data from a xml file (UTF-8 encoded). I
am able to get text just fine using
String str = anElement.getText();
The resulting str string (Unicode encoded) contains exactly what was
defined in my xml file. The charset translation is here transparent for
me. For example if my xml document is:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE DOCUMENT SYSTEM "annonce.dtd">
I get Æ in my str string.
However when I am trying to generate a xml document with this exact
same Æ value, just calling Element.setText("Æ") does not generate a
correct UTF-8 encoded document. I have first to manually do this in my
String text = "Æ";
byte bytes = text.getBytes("UTF8");
String newText = new String(bytes);
Why do I have to do this for the xml generation to work. Why isn't jdom
taking care of the charset translation for me since the resulting file
has UTF-8 encoding specified in it?
Thanks for any help
To control your jdom-interest membership:
More information about the jdom-interest