<div>According to the xml 1.1 spec:</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><a name="sec-physical-struct" id="sec-physical-struct" style="font-family:arial,helvetica,sans-serif">4 Physical Structures ...</a><span style="font-family:arial,helvetica,sans-serif"><br>
</span><span style="font-family:arial,helvetica,sans-serif">[</span><a name="dt-unparsed" id="dt-unparsed" title="Unparsed Entity" style="font-family:arial,helvetica,sans-serif">Definition</a><span style="font-family:arial,helvetica,sans-serif">: An </span><b style="font-family:arial,helvetica,sans-serif">unparsed entity</b><span style="font-family:arial,helvetica,sans-serif"> is a resource whose contents may or may not be </span><a title="Text" href="http://www.w3.org/TR/xml11/#dt-text" style="font-family:arial,helvetica,sans-serif;color:rgb(102,0,153)">text</a><span style="font-family:arial,helvetica,sans-serif">, and if text, may be other than XML. Each unparsed entity has an associated </span><a title="Notation" href="http://www.w3.org/TR/xml11/#dt-notation" style="font-family:arial,helvetica,sans-serif;color:rgb(102,0,153)">notation</a><span style="font-family:arial,helvetica,sans-serif">, identified by name. Beyond a requirement that an XML processor make the identifiers for the entity and notation available to the application, XML places no constraints on the contents of unparsed entities.]</span></blockquote>
<div> </div><div><br></div><div>AND </div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span style="font-family:arial,helvetica,sans-serif">Entities may be either parsed or unparsed. [</span><a name="dt-parsedent" id="dt-parsedent" title="Text Entity" style="font-family:arial,helvetica,sans-serif">Definition</a><span style="font-family:arial,helvetica,sans-serif">: The contents of a </span><b style="font-family:arial,helvetica,sans-serif">parsed entity</b><span style="font-family:arial,helvetica,sans-serif"> are referred to as its </span><a title="Replacement Text" href="http://www.w3.org/TR/xml11/#dt-repltext" style="font-family:arial,helvetica,sans-serif;color:rgb(102,0,153)">replacement text</a><span style="font-family:arial,helvetica,sans-serif">; this </span><a title="Text" href="http://www.w3.org/TR/xml11/#dt-text" style="font-family:arial,helvetica,sans-serif;color:rgb(102,0,153)">text</a><span style="font-family:arial,helvetica,sans-serif"> is considered an integral part of the document.]</span></blockquote>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><font face="arial, helvetica, sans-serif">[<a name="dt-unparsed" id="dt-unparsed" title="Unparsed Entity">Definition</a>: An <b>unparsed entity</b> is a resource whose contents may or may not be <a title="Text" href="http://www.w3.org/TR/xml11/#dt-text" style="color:rgb(102,0,153);background-color:transparent;background-repeat:initial initial">text</a>, and if text, may be other than XML. Each unparsed entity has an associated <a title="Notation" href="http://www.w3.org/TR/xml11/#dt-notation" style="color:rgb(102,0,153);background-color:transparent;background-repeat:initial initial">notation</a>, identified by name. Beyond a requirement that an XML processor make the identifiers for the entity and notation available to the application, XML places no constraints on the contents of unparsed entities.]<br>
Parsed entities are invoked by name using entity references; unparsed entities by name, given in the value of <b>ENTITY</b> or <b>ENTITIES</b> attributes.</font></blockquote><font face="arial, helvetica, sans-serif"><div>
<font face="arial, helvetica, sans-serif"><br></font></div><div><font face="arial, helvetica, sans-serif"><br></font></div>In the current JDOM version, Element method setText(string) and also addContent(CDATA) refuses text that contains illegal characters. It is treating the data provided as 'parsed' when it should by the spec be treating it as free content.</font><div>
<font face="arial, helvetica, sans-serif"><br></font></div><div><font face="arial, helvetica, sans-serif">I understand:</font></div><div><font face="arial, helvetica, sans-serif"><br></font></div><div><div class="gmail_quote">
<font face="arial, helvetica, sans-serif">1) The xml 1.1 spec defines a parsed entity as its 'replacement text'.</font></div><div class="gmail_quote"><font face="arial, helvetica, sans-serif"><br></font></div><div class="gmail_quote">
<font face="arial, helvetica, sans-serif">2) R</font>eplacement text' would refer to the actual textual makeup of a serialized Element, not the data an Element holds in a Text content element</div><div class="gmail_quote">
<br></div><div class="gmail_quote"><br></div><div class="gmail_quote">Then, if the above is true, the current implementation is actually wrong to verify data.</div><div class="gmail_quote"><br></div><div class="gmail_quote">
I propose that JDOM stop verifying data set as Element text and CDATA and leave it to the xerces (or whatever) to make sure the document is proper 1.1.</div><div class="gmail_quote"><br></div><div class="gmail_quote">Am I understanding everything correctly?</div>
<div class="gmail_quote"><br></div><div class="gmail_quote">Thoughts?</div><div class="gmail_quote"><br></div><div class="gmail_quote">---------- Forwarded message ----------</div><div class="gmail_quote">From: <b class="gmail_sendername">Canadian Wilf</b> <span dir="ltr"><<a href="mailto:canwilf@gmail.com">canwilf@gmail.com</a>></span><br>
Date: Thu, Sep 6, 2012 at 9:52 PM<br>Subject: XML 1.1 -- Please stab me with a dull knife and trample my dead body<br>To: <a href="mailto:jdom-interest@jdom.org">jdom-interest@jdom.org</a><br><br><br><div>Hi All,</div><div>
<br></div><div>I just learned that in order to safely use JDOM2, I will need to sanitize my Element .setText(string) so that the parsed data does not contain verboten characters under the XML 1.1 spec.</div>
<div><br></div><div>I have an ascii processor and it needs to be able to use xml as a document format. Unfortunately, not all ascii is allowed in an Element text.</div><div><br></div><div>Stab me with a dull knife and trample my dead body. But ..... please please please don't make me sanitize all my data before putting it into XML Elements.</div>
<div><br></div><div>1) It makes my programming task much more cumbersome because I must ensure not to feed any of the new verboten and doomed ascii/UTF-8 characters to store as xml text.</div><br>
<div>2) No one uses xml 1.1, do they?</div><div><br></div><div>3) It slows down the parsing (a very small amount) with all the element text checking.</div><div><br></div><div>Now that JDOM2 is xml 1.1 compatible, is there any turning back. Can this be undone? </div>
<div><br></div><div>Does everyone understand that their software will bust if data provided as text is not adhering to the new standard?</div><div><br></div><div>What about you? How do you deal with it when using the libraries?</div>
<span class="HOEnZb"><font color="#888888">
<div><br></div><div>Wilf</div>
</font></span></div><br></div>