[jdom-interest] encoding problems when using get text

Jason Hunter jhunter at acm.org
Wed Jun 19 16:19:42 PDT 2002

> I think I'd just like to know why. Is my entity being translated when
> the JDom document is being built or is it java that's translating it on
> the way back out when I call getText()?

The key is to stop thinking about its binary representation and think of
its semantic meaning.  The character entity is a way to represent a
character but isn't the character.  Inside Java and JDOM you're lucky
enough to get to manipulate the actual character.  If you ask for a
String with getText(), you get a string of characters.  You don't have
to worry about which crazy encoding it happened to come from on disk.  

It's the SAX parser that's doing the translation.  The characters() call
back returns characters, not raw encoded bytes.


More information about the jdom-interest mailing list