[jdom-interest] special characters and JDOM
jozart at csi.com
Sun Jul 22 00:39:05 PDT 2001
Andrew Freeman writes:
> I am trying to use JDOM to parse an XML file that contains
> the following character: '-'. However, I am getting a parsing
> error indicating that that Unicode character is invalid.
What you have there is an "en dash". The CP1252 encoding (Windows
character) is 150. The Unicode 2.0 encoding is 8211.
The root of your troubles is that an ASCII "-" (45) is used to represent a
minus sign *and* several different flavors of dashes and hyphens. In
Unicode each of these is given a unique code.
If you really want one of those narrow dashes in your document, I recommend
You could try, instead, changing your document's encoding to cp1252. (Does
You should be safe if you "&entify;" anything with a unicode value greater
than 127, though this isn't always the most user-friendly thing to do.
--- original message ---
From: Andrew Freeman aefreeman at earthlink.net
Date: Fri, 20 Jul 2001 20:17:46 -0400
I am trying to use JDOM to parse an XML file that contains the following
character: '-'. However, I am getting a parsing error indicating that that
Unicode character is invalid.
When I print it out in Java:
System.out.println("" + (int) '-');
I get 8211.
If I print out its ASCII character in another editor I get 150.
Does the XML file need a specific encoding in order to parse the file? Do I
need to have the character escaped with – prior to parsing the file? If
I need to escape the character, is there a rule that tells me what I have to
escape and what I don't? Also, what is special about this character that
it has such a funky int value when I print it out?
More information about the jdom-interest