[jdom-interest] JDOM Entity references for special characters

Tatu Saloranta cowtowncoder at yahoo.com
Fri May 27 10:10:46 PDT 2005


--- Sven Deckers <svedec at kava.be> wrote:
...
> But I was actually looking for a way that
> XMLOutputter just *checks* whether
> the Entity is ok, and then *doesn't* translate it.
> The &ldquo; may remain in the insert statement in
> fact (or the numeric code
> for that matter), since it'll be written in a JSP
> page, and it's valid
> HTML...
> So I don't need the UTF-8 character, I just need a
> check that it's valid.

Unfortunately this is both against the way XML is
supposed to be handled, and outside scope of JDOM
codebase. It's the underlying parser that deals with
entities, and expands them to actual XML nodes.
Think of them as C pre-processor macro equivalents --
compiler never seems them, just preprocessor.
Same way, JDOM is not even aware of any entities,
as long as the parser does its job according to XML
specs.

It would actually be possible to modify underlying
parser to leave entity unexpanded, though. In fact,
at least StAX parsers have such an option; maybe
StAXBuilder.java would be able to build a JDOM tree
that has general entities (ie. entities other than
pre-defined ones [amp, lt, gt, apos, quot] and
character entities) intact, as long as the input
factory has been configured so that automatic entity
expansion is disabled.

-+ Tatu +-

> 
> Thanks,
> Sven.
> 
> ----- Original Message ----- 
> From: "Jason Hunter" <jhunter at xquery.com>
> To: "Sven Deckers" <svedec at kava.be>
> Cc: <jdom-interest at jdom.org>
> Sent: Thursday, May 26, 2005 8:29 PM
> Subject: Re: [jdom-interest] JDOM Entity references
> for special characters
> 
> 
> > Your characters are being written out in UTF-8. 
> It's most definitely
> > not "gibberish".  :)  It just looks like it when
> your viewer isn't UTF-8
> > aware.  You can change the output encoding to
> ASCII and then chars > 127
> > will be auto-escaped.  Or you can use Latin-1
> which will escape > 255.
> > If you want to control the escaping behavior to
> write special entities
> > for certain chars, you'll need to subclass
> XMLOutputter.
> >
> > -jh-
> >
> > Sven Deckers wrote:
> >
> > > Hello,
> > >
> > > I've encountered the following problem in a
> project I'm working on :
> > >
> > > 1. When I parse an XML-file with the following
> special characters :
> > > &ldquo; test valid zone &rdquo;
> > >     they obviously have to be referenced in the
> DTD, otherwise the JDOM
> > > parser will give an exception.
> > >
> > > 2. In the DTD I've included the following W3C
> .ent-files :
> > > "xhtml-lat1.ent", "xhtml-special.ent" and
> "xhtml-symbol.ent" as
> > > recommended in "XML in a nutshell" (O'Reilly)
> > >
> > > 3. The file now is correct according to XMLSpy
> > >
> > > 4. When I start parsing it with JDOM, in order
> to generate
> > > INSERT-statements for a MySQL Database, these
> Entities are *translated*
> > > to gibberish : “ test invalid zone �
> > >
> > > 5. When I explicitly put them in my DTD :
> <!ENTITY ldquo "[ldquo ]">
> > > they are translated to [ldquo ] in the INSERT
> statement.
> > >
> > > My question : how can I tell JDOM to just check
> if the Entity is ok, and
> > > then DON'T translate it.
> > >
> > > Thank you on beforehand,
> > >
> > > Sven.
> > >
> > >
> > >
> > >
>
------------------------------------------------------------------------
> > >
> > > _______________________________________________
> > > To control your jdom-interest membership:
> > >
>
http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
> 
> _______________________________________________
> To control your jdom-interest membership:
>
http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
> 

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


More information about the jdom-interest mailing list