[jdom-interest] Character encoding from UTF-8 to ISO-8859-1
mnott at vignette.com
Wed Feb 6 07:57:53 PST 2002
You cannot transcode exactly UTF-8 to Latin1 as UTF-8 has a richer
set of characters and, especially, as UTF-8 is not a superset of
Latin1. That being said, I've written a mapping table which maps
commonly used characters to their html entities. Some characters,
like the Euro-Symbol, simply do not exist in Latin1, so I transcode
them to EUR. You might do something like that, or you just change
the character set of your database (update props$, but make sure
you first export all your data).
From: Alex Rosen [mailto:arosen at silverstream.com]
Sent: Tuesday, February 05, 2002 5:50 PM
To: 'dumdum 420'; jdom-interest at jdom.org
Subject: RE: [jdom-interest] Character encoding from UTF-8 to ISO-8859-1
Trying to figure out character encoding issues is very tricky. There are a
number of places in the process where the problem could be. Even worse, it's
often confusing to look at a file and determine if it's correct or not,
because the character set you're using to view it is a factor.
If you have a JDOM document, then XMLOutputter will correctly output it in
the character encoding that you specify. Call setEncoding() on it, and then
call the version of output() that takes an OutputStream (not a Writer). (Or
use the version that takes a Writer, but make sure you set the Writer's
encoding properly.) This should work fine.
The "funny characters" you're seeing could be from several reasons: it's not
being stored in the database with the wrong encoding; it's being retrieved
with the wrong encoding; or most likely, you're viewing it using the wrong
encoding. The funny characters may just be the ISO-8859-1 characters, if
your system uses a different default character set.
> -----Original Message-----
> From: jdom-interest-admin at jdom.org
> [mailto:jdom-interest-admin at jdom.org]On Behalf Of dumdum 420
> Sent: Sunday, February 03, 2002 4:29 PM
> To: jdom-interest at jdom.org
> Subject: [jdom-interest] Character encoding from UTF-8 to ISO-8859-1
> I have a xml feed with encoding set to UTF-8. This contains
> certain European
> characters. I have to parse this XML and insert it into
> Oracle Database
> which supports ISO-8859-1.
> Once I accomplish the output of the insert make the
> characters some of them
> look very very funny.
> Is they a way in Java or a workaround how I can convert from UTF-8 to
> ISO-8859-1 in a exact fashion.
> Take care.
> Bhanu Pabreja.
> Here is a simple line try converting it to ISO-8859-1 from
> UTF-8 and you can
> figure out how it is done.
> The purpose of this abstract is to test passing foreign
> characters from the
> Notes system, through UTF-8 xml, into Oracle, and then
> ultimately back out
> into UTF-8 XML that sits on the file system. This abstract
> contains the
> following foreign characters:
> [ Euro symbol: EUR ] [ Full Quotes: "Competing on the Edge"
> ] [ Normal
> Quotes: "Competing on the Edge" ] [ m-dash: - ] [
> characters: ØÅ ] [ "e"
> accent aigu: é ] [ "e" with circumflex: ê ] [ "a" accent
> grave: à ] [ "u"
> with umlaut: ü ] [ "o" with umlaut: ö ] [ "a" with umlaut: ä ]
> Send and receive Hotmail on your mobile device: http://mobile.msn.com
> To control your jdom-interest membership:
uraddr at yourhost.com
To control your jdom-interest membership:
More information about the jdom-interest