[jdom-interest] Dealing with binary characters in-memory -> outputter

Mon Sep 24 15:00:48 PDT 2001

I like your proposed approach.  Our plan thus far has been: For UTF-8
encoding (the default) no encoding is necessary except for special
characters (ie <) which we take care of.  For other encodings you set,
you're responsible for handling things yourself.  

Your approach is to help handle other encodings.  Sounds good to me. 
The problem we hit when thinking about it earlier is there's no standard
library of which chars are in each character set.  So there may have to
be a few supported "other encodings" (aka Latin-1) and other ones you
want to use you have to write yourself (and perhaps donate).

Thoughts?

-jh-

"Trimmer, Todd" wrote:
> 
> Attila Szegedi writes:
> 
> The XMLOutputter authors do a pretty good job of &# escaping "common
> renegade" characters, so maybe the ultimate solution is to add this one to
> the set... The problem is that for every encoding, the set of chars that
> must be escaped is different, and solving this problem on a per-encoding
> basis would be too expensive, either in memory or in time terms. Using the
> newly-introduced Encoder interface in java.io. of JDK1.4 should help, but
> it'll take time until it gets mainstream...
> 
> -=-=-=-
> 
> I have never seen XMLOutputter produce a "&#" escaping under any encoding.
> Looking at the source for escapeAttributeEntities() and
> escapeElementEntities(), I don't see how it possibly could.
> org.jdom.output.XMLWriter DOES escape characters this way, yet it does not
> take the encoding into consideration.
> 
> If different encodings need different characters escaped, then why not have
> a static inner class for each encoding? Sounds like a good use of a Strategy
> Pattern to me.
> 
> By having them be inner classes we are marrying the encodings to the
> XMLOutputter. It would be better if a programmer can supply his own Encoder
> via a setter method for a more esoteric encoding. Yes, java.io.Encoder is a
> JDK1.4 thing, but it doesn't look to hard for us to roll our own
> org.jdom.output.Encoder interface, with stock implementations for the most
> common encodings.
> 
> I, too, came across the same problems with XMLOutputter that Bennett was
> having. I was also trying to use JDOM to read and manipulate HTML and then
> spit it out to another process. The lack of "&#" disturbed me so much that I
> subclassed XMLOutputter as HTMLOutputter and overrode
> escapeAttributeEntities() and escapeElementEntities() to "&#"-escape
> ISO-Latin characters above 168. Yes, it's a specific fix to a specific
> problem, but, Bennet, I propose you use this workaround until the solution
> with the Strategy Pattern can be written.
> 
> To get the ball rolling, what do readers of this newsgroup propose
> org.jdom.output.Encoder have other than the following?
> 
> package org.jdom.output;
> 
> public interface Encoder
> {
>         protected String escapeAttributeEntities(String st);
> 
>         protected String escapeElementEntities(String st);
> }
> _______________________________________________
> To control your jdom-interest membership:
> http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com