[jdom-interest] Re: Getting original Encoding and changing the d efau lt UTF-8

Jason Hunter jhunter at xquery.com
Fri Sep 10 00:53:43 PDT 2004


Young Matthew wrote:

> hej,
> 
> Regarding the default encoding I more thinking on the front end and not with
> printing.  In other words before parsing a document it would be cool if I could
> shift the encoding to someother than UTF-8 to handle svenska characters.

XML files generally have their encoding listed in the declaration if 
they're not UTF-8.  So the parser automatically can determine the proper 
encoding to use.  Getting the data in correctly isn't an issue; the 
issue arises if you want to encode the document the same way on output 
instead of using the universal UTF-8 encoding.  SAX doesn't report what 
the original encoding was, just returns the already-decoded characters.

Another builder, like an XNI builder, could report the encoding.  The 
Document class doesn't currently have an encoding property but we could 
add one if we had a parser that reported it.  That is, assuming it's a 
document-level notion.  The story's less clear when pulling together 
elements from multiple documents.  If the original Document node was 
Latin-1 but you included an Element from a Shift_JIS document, you can't 
reliably assume Latin-1 for the new document.

-jh-



More information about the jdom-interest mailing list