[jdom-interest] Encoding not working as expected - Copyright Symbol

Christian Cabanero chumpboy at yahoo.com
Thu Jun 21 23:59:03 PDT 2001


First off, let me just say that JDom is DA BOMB DIGITY!  Congrats on
building such a great product.

But unfortunately, something's been confusing me.  I have an XML document
that is UTF-8 encoded and contains the (C) symbol encoded with UTF-8 (at
least I assume that it is).  It shows up in my XML as...


<?xml version="1.0" encoding="UTF-8"?>
<article section="ECONOMIC">
	<copyright>\302\251 Copyright 2001 USA TODAY, a division of Gannett Co.
Inc.</copyright>
</article>


...the "\302\251" being the character for (C).  I load this XML file using a
SAXBuilder and then just spit it right out again into another file using an
XMLOutputter like so...


  public static void main(String[] args) throws Exception {
    SAXBuilder builder = new SAXBuilder();
    Document doc = builder.build(args[0]);  // pass in the xml file
containing the copyright symbol
    XMLOutputter out = new XMLOutputter("  ", true, "UTF-8");
    FileWriter writer = new FileWriter("output.xml");
    out.output(doc, writer);
    writer.close();
  }


BUT, for some reason this modifies the XML data and messes up the copyright
symbol...

output.xml:

<?xml version="1.0" encoding="UTF-8"?>
<article section="ECONOMIC">
	<copyright>\251 Copyright 2001 USA TODAY, a division of Gannett Co.
Inc.</copyright>
</article>


What happened to the copyright symbol?  Am I missing something?
Subsequently, if I try to read in the resulting output.xml file I get a
JDOMException which reports "Character conversion error: "Unconvertible
UTF-8 character beginning with 0xa9" (line number may be too low)."

Any help would be very much appreciated.  I've been using JDom with a lot of
success so far and just hit this snag, but otherwise have found it to be an
exceptional product.

Thanks in advance!

-Christian Cabanero




More information about the jdom-interest mailing list