[jdom-interest] Dealing with binary characters in-memory -> outputter

Fri Sep 21 09:21:04 PDT 2001

Set it in both places, interesting.
I will try that.

I was under the impression that this was
the default (or that some other encoding was
the default) and that, as the default, it
would be effectively set in both places.
In other words, presumably it is using some
encoding by default, and presumably this byte
is not mapped, so regardless it should encode
it, right?

But yes, I will try specifically setting it in
both places.  I had seen the warning that they would
not remember it from the original doc that
was input, that I knew.

-----Original Message-----
From: Attila Szegedi [mailto:szegedia at freemail.hu]
Sent: Friday, September 21, 2001 6:34 AM
To: mbennett at ideaeng.com; jdom-interest at jdom.org
Subject: Re: [jdom-interest] Dealing with binary characters in-memory ->
outputter

Strange. I've never came across a situation where XMLOutputter "ignores"
UTF-8. It may be optimized in a way that it does not output encoding
specification into the output XML declaration. It is completely legal, as
the default encoding per XML spec is UTF-8, so it can be omitted in this
case.

If this is not the issue, then it might be that you're not specifying UTF-8
everywhere you should. I hope you're aware that in order to have
XMLOutputter use specific encoding, you must specify the encoding BOTH to a
Writer AND to the XMLOutputter setEncoding, like:

File outputFile = ...;
String encoding = "UTF-8";
Document doc = ....;

Writer w = new BufferedWriter(new OutputStreamWriter(new
FileOutputStream(outputFile), encoding));
try
{
    XMLOutputter outputter = new XMLOutputter();
    outputter.setEncoding(encoding);
    outputter.output(doc, w);
}
finally
{
    w.close();
}

Attila.

----- Original Message -----
From: "Mark Bennett" <mbennett at ideaeng.com>
To: "Attila Szegedi" <szegedia at freemail.hu>; <jdom-interest at jdom.org>
Sent: 2001. szeptember 21. 11:14
Subject: RE: [jdom-interest] Dealing with binary characters in-memory ->
outputter

> Hello Attila,
>
> Thanks for your suggestion.
>
> I had tried UTF-8, but the outputter seemed to ignore it.
> I agree, if authoring XML in an ASCII editor, that would
> be a fine way to do it.
>
> And I hear what you're saying about the different encodings
> having different characters.
>
> But how about for a given encoder:
> * Is this character in my map?
> Yes
> then output it as it is mapped
> No
> then use the generic escape sequence &#xNN;
>
> So instead of tracking rules for every character, it would
> simply need to know that this wasn't in it's map, so it should
> therefore use the generic escaping.
>
> I think...
>