<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
<title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
William Krick wrote:<br>
<blockquote cite="midKEEAIOEEIFICLFEIJDHMMEGMCGAA.wkrick@eio-online.com"
type="cite">
<pre wrap="">...
XMLOutputter fmt = new XMLOutputter("", false, "UTF-8");
</pre>
</blockquote>
<br>
Puzzling. Has this constructor been removed recently? It's not in the
CVS trunk.<br>
<br>
<blockquote cite="midKEEAIOEEIFICLFEIJDHMMEGMCGAA.wkrick@eio-online.com"
type="cite">
<pre wrap="">...and the problem seems to be gone.
The "byte order mark" FF FE is still there when viewed
in a hex editor but the XML output is no longer clipped
at the beginning.
</pre>
</blockquote>
<br>
<br>
Does this possibly suggest a bug somewhere? When writing UTF-8, the
BOM should be EF BB BF not FF FE (FF FE indicates UTF16-LE). A quick
look at XMLOutputter makes me think it's not the problem: it merely
calls the standard Java APIs.<br>
<br>
>From <a class="moz-txt-link-freetext" href="http://www.unicode.org/unicode/faq/utf_bom.html#BOM">http://www.unicode.org/unicode/faq/utf_bom.html#BOM</a> :<br>
<br>
<table border="1" cellpadding="2" cellspacing="0">
<tbody>
<tr>
<th width="50%">Bytes</th>
<th width="50%">Encoding Form</th>
</tr>
<tr>
<td width="50%">00 00 FE FF</td>
<td width="50%">UTF-32, big-endian</td>
</tr>
<tr>
<td width="50%">FF FE 00 00</td>
<td width="50%">UTF-32, little-endian</td>
</tr>
<tr>
<td width="50%">FE FF</td>
<td width="50%">UTF-16, big-endian</td>
</tr>
<tr>
<td width="50%">FF FE</td>
<td width="50%">UTF-16, little-endian</td>
</tr>
<tr>
<td width="50%">EF BB BF</td>
<td width="50%">UTF-8</td>
</tr>
</tbody>
</table>
<br>
<br>
Rick :-)<br>
<br>
</body>
</html>