[jdom-interest] Encoding not working as expected - Copyright Symbol

Thomas Koch Thomas.Koch at atlantec-es.com
Fri Jun 22 06:49:27 PDT 2001


Your problem is caused by the use of FileWriter, which enforces
your default Java VM encoding to be used for the actual String
to byte conversion when flushing the data to the disk.
XMLOutputter can't find out about the encoding used in
FileWriter, so you have to make sure the Writer you pass
in is using the specified encoding.

Make sure FileWriter uses the UTF-8 encoding
or just use FileOutputStream instead, and
things should be OK.

Thomas

On Friday 22 June 2001 08:59, you wrote:
> First off, let me just say that JDom is DA BOMB DIGITY!  Congrats on
> building such a great product.
>
> But unfortunately, something's been confusing me.  I have an XML document
> that is UTF-8 encoded and contains the (C) symbol encoded with UTF-8 (at
> least I assume that it is).  It shows up in my XML as...
>
>
> <?xml version="1.0" encoding="UTF-8"?>
> <article section="ECONOMIC">
> 	<copyright>\302\251 Copyright 2001 USA TODAY, a division of Gannett Co.
> Inc.</copyright>
> </article>
>
>
> ....the "\302\251" being the character for (C).  I load this XML file using
> a SAXBuilder and then just spit it right out again into another file using
> an XMLOutputter like so...
>
>
>   public static void main(String[] args) throws Exception {
>     SAXBuilder builder = new SAXBuilder();
>     Document doc = builder.build(args[0]);  // pass in the xml file
> containing the copyright symbol
>     XMLOutputter out = new XMLOutputter("  ", true, "UTF-8");
>     FileWriter writer = new FileWriter("output.xml");
>     out.output(doc, writer);
>     writer.close();
>   }
>
>
> BUT, for some reason this modifies the XML data and messes up the copyright
> symbol...
>
> output.xml:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <article section="ECONOMIC">
> 	<copyright>\251 Copyright 2001 USA TODAY, a division of Gannett Co.
> Inc.</copyright>
> </article>
>
>
> What happened to the copyright symbol?  Am I missing something?
> Subsequently, if I try to read in the resulting output.xml file I get a
> JDOMException which reports "Character conversion error: "Unconvertible
> UTF-8 character beginning with 0xa9" (line number may be too low)."
>
> Any help would be very much appreciated.  I've been using JDom with a lot
> of success so far and just hit this snag, but otherwise have found it to be
> an exceptional product.
>
> Thanks in advance!
>
> -Christian Cabanero
>



More information about the jdom-interest mailing list