[jdom-interest] Dealing with binary characters in-memory -> o utputter

Trimmer, Todd todd.trimmer at trizetto.com
Mon Sep 24 08:36:58 PDT 2001

Attila Szegedi writes:

The XMLOutputter authors do a pretty good job of &# escaping "common
renegade" characters, so maybe the ultimate solution is to add this one to
the set... The problem is that for every encoding, the set of chars that
must be escaped is different, and solving this problem on a per-encoding
basis would be too expensive, either in memory or in time terms. Using the
newly-introduced Encoder interface in java.io. of JDK1.4 should help, but
it'll take time until it gets mainstream...


I have never seen XMLOutputter produce a "&#" escaping under any encoding.
Looking at the source for escapeAttributeEntities() and
escapeElementEntities(), I don't see how it possibly could.
org.jdom.output.XMLWriter DOES escape characters this way, yet it does not
take the encoding into consideration.

If different encodings need different characters escaped, then why not have
a static inner class for each encoding? Sounds like a good use of a Strategy
Pattern to me.

By having them be inner classes we are marrying the encodings to the
XMLOutputter. It would be better if a programmer can supply his own Encoder
via a setter method for a more esoteric encoding. Yes, java.io.Encoder is a
JDK1.4 thing, but it doesn't look to hard for us to roll our own
org.jdom.output.Encoder interface, with stock implementations for the most
common encodings.

I, too, came across the same problems with XMLOutputter that Bennett was
having. I was also trying to use JDOM to read and manipulate HTML and then
spit it out to another process. The lack of "&#" disturbed me so much that I
subclassed XMLOutputter as HTMLOutputter and overrode
escapeAttributeEntities() and escapeElementEntities() to "&#"-escape
ISO-Latin characters above 168. Yes, it's a specific fix to a specific
problem, but, Bennet, I propose you use this workaround until the solution
with the Strategy Pattern can be written.

To get the ball rolling, what do readers of this newsgroup propose
org.jdom.output.Encoder have other than the following?

package org.jdom.output;

public interface Encoder
	protected String escapeAttributeEntities(String st);

	protected String escapeElementEntities(String st);

More information about the jdom-interest mailing list