[jdom-interest] XMLOutputter & StringWriter

Gabor Greif gabor at no.netopia.com
Tue Jul 4 02:12:10 PDT 2000


On Mon, Jul 3, 2000 19:38 Uhr, Jason Hunter <mailto:jhunter at collab.net>
wrote:
>What's wrong with the following?
>
>res.setContentType("text/html");
>XMLOutputter outputter = new XMLOutputter();
>outputter.output(doc, res.getWriter())
>
>What's wrong is that the Writer is Latin-1 so that's what the document
>chars will be encoded in, but in the document's declaration we'll write
>that it's encoded in UTF-8 because that's our default.  Any reader will
>get confused trying to read Latin-1 as UTF-8.
>
>-jh-

If I understand correctly, you mean the encoding as in

<?xml version="1.0" encoding="UTF-8" ?>

then I agree.

However  all is not clear. Take for example the pound sign. When I have it
in a Java string then it has an unambiguous character code associated with
it, say L. So if you output a string containing the pound sign character to
a Writer then no conversion of L happens. If you grab the conten string
from the Writer and stream it out to a file, then the unicode character
code L will be converted, becoming some high ascii integer with latin
encoding or an escaped double-byte sequence in case of UTF-8.


I see the main problem in the fact that you have to decide which encoding
will later be used as early when you pass a Writer to the
XMLOutputter.output method. However this method does not actually encode
anything, but simply concatenates a string. The encoding would happen, when
you wanted to transfer the string over a stream (file, socket, etc.) Then
you provide a byte encoding, and are allowed to cheat by specifying a
different one than promised in  the <?xml> instruction. Doing so would lead
to major confusion on the receiving side.

This brings me to the idea, that XMLOutputter.output(Document, Writer)
should not put an encoding information into the <?xml version="1.0">
instruction. XMLOutputter.output(Document, Writer, String encoding) should
put encoding information into the <?xml version="1.0" encoding=...>
instruction.


This is the way I see the issue, but not being an expert, I would gladly
stand corrected.

	Gabor






More information about the jdom-interest mailing list