[jdom-interest] Dealing with binary characters in-memory -> outputter (sorry to keep things going but...)

Mon Sep 24 22:31:44 PDT 2001

Actually not quite, see comments.

-----Original Message-----
From: philip.nelson at omniresources.com
[mailto:philip.nelson at omniresources.com]
Sent: Monday, September 24, 2001 5:57 PM
To: jhunter at collab.net; todd.trimmer at trizetto.com
Cc: jdom-interest at jdom.org; szegedia at freemail.hu; mbennett at ideaeng.com
Subject: RE: [jdom-interest] Dealing with binary characters in-memory ->
outputter

> >
> > I'm sorry to be so dense but I don't think this works.
> >

> right!  I don't think it can work.  A character entity will be expanded by
> the parser.  In element content we don't check for valid characters for
> performance reasons.  So now 0xA9 is expanded into a char.  On output you
no
> longer have any character entity so the char is output as is.  On the
second
> read, there is no way the parser should accept this.  It is not a valid
xml
> character and is rejected on the second pass.
>
> Am I close?

So you're saying Outputter is broken?

I think you're assuming that this high byte begins life as
part of an XML file (you talk about entities being expanded by
the parser, etc)  And that it's decoded from that source
into a single byte.

It doesn't come from an initial external XML file.

It gets into the tree in-memory with an addContent(String)

Then you say "On output you no longer have any character
entity so the char is output as is."  That last phrase,
"is output as is" is the crux of the problem.  And are you
saying that's propper behavior or a bug?

A message to the list:

Folks, I'm sorry for all the traffic this has generated on
the list, but as you see it seems like we're getting different
answers.  And no answer (aside from overriding the Outputter)
seems to mesh with the observed behavior.

I do think that the Outputter is broken, even in it's default UTF-8 mode.
(Sorry but I'm not quite brave enough to take a whack at fixing it.)

Mark