[jdom-interest] JDOM Entity references for special characters

Jason Hunter jhunter at xquery.com
Fri May 27 19:13:53 PDT 2005

JDOM never sees the entity in its entity form.  When you parse a 
document using SAX, the SAX parser converts the input document text into 
simple characters handed over with the characters() callback.  No matter 
how the character was originally encoded, SAX treats it the same and so 
does the JDOM internal document model.  Thus you can't rely on an input 
encoding to influence an output encoding.


Sven Deckers wrote:

> Ok, I'll have to look-up how to subclass the XMLOutputter then.
> But I was actually looking for a way that XMLOutputter just *checks* whether
> the Entity is ok, and then *doesn't* translate it.
> The “ may remain in the insert statement in fact (or the numeric code
> for that matter), since it'll be written in a JSP page, and it's valid
> HTML...
> So I don't need the UTF-8 character, I just need a check that it's valid.
> Thanks,
> Sven.
> ----- Original Message ----- 
> From: "Jason Hunter" <jhunter at xquery.com>
> To: "Sven Deckers" <svedec at kava.be>
> Cc: <jdom-interest at jdom.org>
> Sent: Thursday, May 26, 2005 8:29 PM
> Subject: Re: [jdom-interest] JDOM Entity references for special characters
>>Your characters are being written out in UTF-8.  It's most definitely
>>not "gibberish".  :)  It just looks like it when your viewer isn't UTF-8
>>aware.  You can change the output encoding to ASCII and then chars > 127
>>will be auto-escaped.  Or you can use Latin-1 which will escape > 255.
>>If you want to control the escaping behavior to write special entities
>>for certain chars, you'll need to subclass XMLOutputter.
>>Sven Deckers wrote:
>>>I've encountered the following problem in a project I'm working on :
>>>1. When I parse an XML-file with the following special characters :
>>>&ldquo; test valid zone &rdquo;
>>>    they obviously have to be referenced in the DTD, otherwise the JDOM
>>>parser will give an exception.
>>>2. In the DTD I've included the following W3C .ent-files :
>>>"xhtml-lat1.ent", "xhtml-special.ent" and "xhtml-symbol.ent" as
>>>recommended in "XML in a nutshell" (O'Reilly)
>>>3. The file now is correct according to XMLSpy
>>>4. When I start parsing it with JDOM, in order to generate
>>>INSERT-statements for a MySQL Database, these Entities are *translated*
>>>to gibberish : “ test invalid zone �
>>>5. When I explicitly put them in my DTD : <!ENTITY ldquo "[ldquo ]">
>>>they are translated to [ldquo ] in the INSERT statement.
>>>My question : how can I tell JDOM to just check if the Entity is ok, and
>>>then DON'T translate it.
>>>Thank you on beforehand,
>>>To control your jdom-interest membership:
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com

More information about the jdom-interest mailing list