Then, what you are referring to is <!ENTITY bob "Bob Weaver">. And that part of the spec has nothing to do with the JDOM content in a Text of CDATA node.<div><br></div><div>Wilf</div><div><br><div><div><br>
<div class="gmail_quote">On Fri, Sep 7, 2012 at 6:05 AM, Michael Kay <span dir="ltr"><<a href="mailto:mike@saxonica.com" target="_blank">mike@saxonica.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
No, that's all wrong. The contents of an unparsed entity are always
an external resource, they are never part of a text or attribute
node. Parsed entities do become part of the content, but they must
always use the XML character set.<br>
<br>
Michael Kay<br>
Saxonica<br>
<br>
<div>On 07/09/2012 13:10, Canadian Wilf
wrote:<br>
</div>
<blockquote type="cite">
<div>According to the xml 1.1 spec:</div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><a name="139a0ed3ca62ae5a_sec-physical-struct" style="font-family:arial,helvetica,sans-serif">4 Physical
Structures ...</a><span style="font-family:arial,helvetica,sans-serif"><br>
</span><span style="font-family:arial,helvetica,sans-serif">[</span><a name="139a0ed3ca62ae5a_dt-unparsed" title="Unparsed Entity" style="font-family:arial,helvetica,sans-serif">Definition</a><span style="font-family:arial,helvetica,sans-serif">: An </span><b style="font-family:arial,helvetica,sans-serif">unparsed entity</b><span style="font-family:arial,helvetica,sans-serif"> is a resource
whose contents may or may not be </span><a title="Text" href="http://www.w3.org/TR/xml11/#dt-text" style="font-family:arial,helvetica,sans-serif;color:rgb(102,0,153)" target="_blank">text</a><span style="font-family:arial,helvetica,sans-serif">, and if text,
may be other than XML. Each unparsed entity has an associated </span><a title="Notation" href="http://www.w3.org/TR/xml11/#dt-notation" style="font-family:arial,helvetica,sans-serif;color:rgb(102,0,153)" target="_blank">notation</a><span style="font-family:arial,helvetica,sans-serif">, identified by
name. Beyond a requirement that an XML processor make the
identifiers for the entity and notation available to the
application, XML places no constraints on the contents of
unparsed entities.]</span></blockquote>
<div> </div>
<div><br>
</div>
<div>AND </div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span style="font-family:arial,helvetica,sans-serif">Entities may be
either parsed or unparsed. [</span><a name="139a0ed3ca62ae5a_dt-parsedent" title="Text Entity" style="font-family:arial,helvetica,sans-serif">Definition</a><span style="font-family:arial,helvetica,sans-serif">: The contents
of a </span><b style="font-family:arial,helvetica,sans-serif">parsed
entity</b><span style="font-family:arial,helvetica,sans-serif"> are
referred to as its </span><a title="Replacement Text" href="http://www.w3.org/TR/xml11/#dt-repltext" style="font-family:arial,helvetica,sans-serif;color:rgb(102,0,153)" target="_blank">replacement
text</a><span style="font-family:arial,helvetica,sans-serif">;
this </span><a title="Text" href="http://www.w3.org/TR/xml11/#dt-text" style="font-family:arial,helvetica,sans-serif;color:rgb(102,0,153)" target="_blank">text</a><span style="font-family:arial,helvetica,sans-serif"> is considered
an integral part of the document.]</span></blockquote>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><font face="arial, helvetica, sans-serif">[<a name="139a0ed3ca62ae5a_dt-unparsed" title="Unparsed Entity">Definition</a>:
An <b>unparsed entity</b> is a resource whose contents may or
may not be <a title="Text" href="http://www.w3.org/TR/xml11/#dt-text" style="color:rgb(102,0,153);background-color:transparent;background-repeat:initial initial" target="_blank">text</a>, and if text, may be other than XML. Each
unparsed entity has an associated <a title="Notation" href="http://www.w3.org/TR/xml11/#dt-notation" style="color:rgb(102,0,153);background-color:transparent;background-repeat:initial initial" target="_blank">notation</a>, identified by name. Beyond a
requirement that an XML processor make the identifiers for the
entity and notation available to the application, XML places
no constraints on the contents of unparsed entities.]<br>
Parsed entities are invoked by name using entity references;
unparsed entities by name, given in the value of <b>ENTITY</b> or <b>ENTITIES</b> attributes.</font></blockquote>
<font face="arial, helvetica, sans-serif">
<div>
<font face="arial, helvetica, sans-serif"><br>
</font></div>
<div><font face="arial, helvetica, sans-serif"><br>
</font></div>
In the current JDOM version, Element method setText(string) and
also addContent(CDATA) refuses text that contains illegal
characters. It is treating the data provided as 'parsed' when it
should by the spec be treating it as free content.</font>
<div>
<font face="arial, helvetica, sans-serif"><br>
</font></div>
<div><font face="arial, helvetica, sans-serif">I understand:</font></div>
<div><font face="arial, helvetica, sans-serif"><br>
</font></div>
<div>
<div class="gmail_quote">
<font face="arial, helvetica, sans-serif">1) The xml 1.1 spec
defines a parsed entity as its 'replacement text'.</font></div>
<div class="gmail_quote"><font face="arial, helvetica,
sans-serif"><br>
</font></div>
<div class="gmail_quote">
<font face="arial, helvetica, sans-serif">2) R</font>eplacement
text' would refer to the actual textual makeup of a serialized
Element, not the data an Element holds in a Text content
element</div>
<div class="gmail_quote">
<br>
</div>
<div class="gmail_quote"><br>
</div>
<div class="gmail_quote">Then, if the above is true, the current
implementation is actually wrong to verify data.</div>
<div class="gmail_quote"><br>
</div>
<div class="gmail_quote">
I propose that JDOM stop verifying data set as Element text
and CDATA and leave it to the xerces (or whatever) to make
sure the document is proper 1.1.</div>
<div class="gmail_quote"><br>
</div>
<div class="gmail_quote">Am I understanding everything
correctly?</div>
<div class="gmail_quote"><br>
</div>
<div class="gmail_quote">Thoughts?</div>
<div class="gmail_quote"><br>
</div>
<div class="gmail_quote">---------- Forwarded message ----------</div>
<div class="gmail_quote">From: <b class="gmail_sendername">Canadian
Wilf</b> <span dir="ltr"><<a href="mailto:canwilf@gmail.com" target="_blank">canwilf@gmail.com</a>></span><br>
Date: Thu, Sep 6, 2012 at 9:52 PM<br>
Subject: XML 1.1 -- Please stab me with a dull knife and
trample my dead body<br>
To: <a href="mailto:jdom-interest@jdom.org" target="_blank">jdom-interest@jdom.org</a><br>
<br>
<br>
<div>Hi All,</div>
<div>
<br>
</div>
<div>I just learned that in order to safely use JDOM2, I will
need to sanitize my Element .setText(string) so that the
parsed data does not contain verboten characters under the
XML 1.1 spec.</div>
<div><br>
</div>
<div>I have an ascii processor and it needs to be able to use
xml as a document format. Unfortunately, not all ascii is
allowed in an Element text.</div>
<div><br>
</div>
<div>Stab me with a dull knife and trample my dead body. But
..... please please please don't make me sanitize all my
data before putting it into XML Elements.</div>
<div><br>
</div>
<div>1) It makes my programming task much more cumbersome
because I must ensure not to feed any of the new verboten
and doomed ascii/UTF-8 characters to store as xml text.</div>
<br>
<div>2) No one uses xml 1.1, do they?</div>
<div><br>
</div>
<div>3) It slows down the parsing (a very small amount) with
all the element text checking.</div>
<div><br>
</div>
<div>Now that JDOM2 is xml 1.1 compatible, is there any
turning back. Can this be undone? </div>
<div><br>
</div>
<div>Does everyone understand that their software will bust if
data provided as text is not adhering to the new standard?</div>
<div><br>
</div>
<div>What about you? How do you deal with it when using the
libraries?</div>
<span><font color="#888888">
<div><br>
</div>
<div>Wilf</div>
</font></span></div>
<br>
</div>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
To control your jdom-interest membership:
<a href="http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com" target="_blank">http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com</a></pre>
</blockquote>
<br>
</div>
<br>_______________________________________________<br>
To control your jdom-interest membership:<br>
<a href="http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com" target="_blank">http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com</a><br></blockquote></div><br></div></div></div>