<html>
<body>
Hello,<br><br>
I'm not sure if this is a bug or a feature, but I thought I would report
it anyway... I have attached (also reproduced below) a simple example
that illustrates the problem. I have tested this with Java 1.6EE, and
JDOM's Jan 9th, 2009 nightly build as well as the standard 1.1
release.<br><br>
In this example, I am trying to prevent the expansion of the entity
"&minus;" in an XHTML document that is being read in and
then immediately written out. I create an instance of SAXBuilder,
setExpandEntities(false), then call the build() method on an input XHTML
doc. For simplicity, I then use an instance of XMLOutputter to print the
parsed document to standard out (Even though I don't think it's necessary
for standard out, I also make sure the encoding is consistent between the
Format and the OutputStream and that it is a common "US-ASCII"
format).<br><br>
The original XHTML document uses the entity:<br>
&minus;<br><br>
But, the resulting XHTML printed to standard out shows:<br>
&minus;&#x2212;<br><br>
Apparently, setting "setExpandEntities(false)" had the effect
of duplicating the character. I would expect that setting expand entities
to 'false' would simply leave the "&minus;", without
duplicating it in US-ASCII formatting.<br><br>
This isn't a big problem because if the default value, 'true', is used
for entity expansion, the resulting output will simply contain
"&#x2212;" instead of duplicating the character. Even
though the original entity encoding has changed, the resulting output
will still behave/appear the same as the original, which is probably
what's normally required.<br><br>
- Thanks for any feedback & Happy 2009,<br>
- David W.<br><br>
======= INPUT XHTML DOCUMENT START =======<br>
<?xml version="1.0" encoding="UTF-8"?><br>
<?xml-stylesheet type="text/xsl"
href="http://www.w3.org/Math/XSL/pmathml.xsl"?><br>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/2000/REC-xhtml1-20000126/DTD/xhtml1-strict.dtd"><br>
<html><br>
<head><br>
</head><br>
<body><br>
<p>&minus;</p><br>
</body><br>
</html><br>
======= INPUT XHTML DOCUMENT END =======<br><br>
<br>
======= TEST JAVA CODE START =======<br>
import java.io.File;<br>
import java.io.OutputStreamWriter;<br><br>
import org.jdom.Document;<br>
import org.jdom.input.SAXBuilder;<br>
import org.jdom.output.Format;<br>
import org.jdom.output.XMLOutputter;<br><br>
public class Test {<br>
<x-tab> </x-tab>public
static void main(String[] args) throws Exception{<br>
<x-tab> </x-tab><x-tab>
</x-tab>File fileInput =
new File("testEntity.xml");<br>
<x-tab> </x-tab><x-tab>
</x-tab>Document
doc;<br><br>
<x-tab> </x-tab><x-tab>
</x-tab>SAXBuilder b =
new SAXBuilder();<br>
<x-tab> </x-tab><x-tab>
</x-tab>
b.setIgnoringElementContentWhitespace(true);<br>
<x-tab> </x-tab><x-tab>
</x-tab>
b.setExpandEntities(false);<br>
<x-tab> </x-tab><x-tab>
</x-tab>doc =
b.build(fileInput);<br>
<x-tab> </x-tab><x-tab>
</x-tab>
doc.getDocType().setInternalSubset(null);<br><br>
<x-tab> </x-tab><x-tab>
</x-tab>XMLOutputter
outputter = new XMLOutputter();<br>
<x-tab> </x-tab><x-tab>
</x-tab>Format format =
Format.getPrettyFormat();<br>
<x-tab> </x-tab><x-tab>
</x-tab>
format.setEncoding("US-ASCII");<br>
<x-tab> </x-tab><x-tab>
</x-tab>
outputter.setFormat(format);<br><br>
<x-tab> </x-tab><x-tab>
</x-tab>
outputter.output(doc, new
OutputStreamWriter(System.out,format.getEncoding()));<br>
<x-tab> </x-tab>}<br>
}<br>
======= TEST JAVA CODE END =====</body>
</html>