<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
No, that's all wrong. The contents of an unparsed entity are always
an external resource, they are never part of a text or attribute
node. Parsed entities do become part of the content, but they must
always use the XML character set.<br>
<br>
Michael Kay<br>
Saxonica<br>
<br>
<div class="moz-cite-prefix">On 07/09/2012 13:10, Canadian Wilf
wrote:<br>
</div>
<blockquote
cite="mid:CAL8g3USfsG4UY3UzWddY=nQm2HHO8Dcz8ypd2AATC7DWFoUAKQ@mail.gmail.com"
type="cite">
<div>According to the xml 1.1 spec:</div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><a
moz-do-not-send="true" name="sec-physical-struct"
id="sec-physical-struct"
style="font-family:arial,helvetica,sans-serif">4 Physical
Structures ...</a><span
style="font-family:arial,helvetica,sans-serif"><br>
</span><span style="font-family:arial,helvetica,sans-serif">[</span><a
moz-do-not-send="true" name="dt-unparsed" id="dt-unparsed"
title="Unparsed Entity"
style="font-family:arial,helvetica,sans-serif">Definition</a><span
style="font-family:arial,helvetica,sans-serif">: An </span><b
style="font-family:arial,helvetica,sans-serif">unparsed entity</b><span
style="font-family:arial,helvetica,sans-serif"> is a resource
whose contents may or may not be </span><a
moz-do-not-send="true" title="Text"
href="http://www.w3.org/TR/xml11/#dt-text"
style="font-family:arial,helvetica,sans-serif;color:rgb(102,0,153)">text</a><span
style="font-family:arial,helvetica,sans-serif">, and if text,
may be other than XML. Each unparsed entity has an associated </span><a
moz-do-not-send="true" title="Notation"
href="http://www.w3.org/TR/xml11/#dt-notation"
style="font-family:arial,helvetica,sans-serif;color:rgb(102,0,153)">notation</a><span
style="font-family:arial,helvetica,sans-serif">, identified by
name. Beyond a requirement that an XML processor make the
identifiers for the entity and notation available to the
application, XML places no constraints on the contents of
unparsed entities.]</span></blockquote>
<div> </div>
<div><br>
</div>
<div>AND </div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span
style="font-family:arial,helvetica,sans-serif">Entities may be
either parsed or unparsed. [</span><a moz-do-not-send="true"
name="dt-parsedent" id="dt-parsedent" title="Text Entity"
style="font-family:arial,helvetica,sans-serif">Definition</a><span
style="font-family:arial,helvetica,sans-serif">: The contents
of a </span><b style="font-family:arial,helvetica,sans-serif">parsed
entity</b><span style="font-family:arial,helvetica,sans-serif"> are
referred to as its </span><a moz-do-not-send="true"
title="Replacement Text"
href="http://www.w3.org/TR/xml11/#dt-repltext"
style="font-family:arial,helvetica,sans-serif;color:rgb(102,0,153)">replacement
text</a><span style="font-family:arial,helvetica,sans-serif">;
this </span><a moz-do-not-send="true" title="Text"
href="http://www.w3.org/TR/xml11/#dt-text"
style="font-family:arial,helvetica,sans-serif;color:rgb(102,0,153)">text</a><span
style="font-family:arial,helvetica,sans-serif"> is considered
an integral part of the document.]</span></blockquote>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><font
face="arial, helvetica, sans-serif">[<a moz-do-not-send="true"
name="dt-unparsed" id="dt-unparsed" title="Unparsed Entity">Definition</a>:
An <b>unparsed entity</b> is a resource whose contents may or
may not be <a moz-do-not-send="true" title="Text"
href="http://www.w3.org/TR/xml11/#dt-text"
style="color:rgb(102,0,153);background-color:transparent;background-repeat:initial
initial">text</a>, and if text, may be other than XML. Each
unparsed entity has an associated <a moz-do-not-send="true"
title="Notation"
href="http://www.w3.org/TR/xml11/#dt-notation"
style="color:rgb(102,0,153);background-color:transparent;background-repeat:initial
initial">notation</a>, identified by name. Beyond a
requirement that an XML processor make the identifiers for the
entity and notation available to the application, XML places
no constraints on the contents of unparsed entities.]<br>
Parsed entities are invoked by name using entity references;
unparsed entities by name, given in the value of <b>ENTITY</b> or <b>ENTITIES</b> attributes.</font></blockquote>
<font face="arial, helvetica, sans-serif">
<div>
<font face="arial, helvetica, sans-serif"><br>
</font></div>
<div><font face="arial, helvetica, sans-serif"><br>
</font></div>
In the current JDOM version, Element method setText(string) and
also addContent(CDATA) refuses text that contains illegal
characters. It is treating the data provided as 'parsed' when it
should by the spec be treating it as free content.</font>
<div>
<font face="arial, helvetica, sans-serif"><br>
</font></div>
<div><font face="arial, helvetica, sans-serif">I understand:</font></div>
<div><font face="arial, helvetica, sans-serif"><br>
</font></div>
<div>
<div class="gmail_quote">
<font face="arial, helvetica, sans-serif">1) The xml 1.1 spec
defines a parsed entity as its 'replacement text'.</font></div>
<div class="gmail_quote"><font face="arial, helvetica,
sans-serif"><br>
</font></div>
<div class="gmail_quote">
<font face="arial, helvetica, sans-serif">2) R</font>eplacement
text' would refer to the actual textual makeup of a serialized
Element, not the data an Element holds in a Text content
element</div>
<div class="gmail_quote">
<br>
</div>
<div class="gmail_quote"><br>
</div>
<div class="gmail_quote">Then, if the above is true, the current
implementation is actually wrong to verify data.</div>
<div class="gmail_quote"><br>
</div>
<div class="gmail_quote">
I propose that JDOM stop verifying data set as Element text
and CDATA and leave it to the xerces (or whatever) to make
sure the document is proper 1.1.</div>
<div class="gmail_quote"><br>
</div>
<div class="gmail_quote">Am I understanding everything
correctly?</div>
<div class="gmail_quote"><br>
</div>
<div class="gmail_quote">Thoughts?</div>
<div class="gmail_quote"><br>
</div>
<div class="gmail_quote">---------- Forwarded message ----------</div>
<div class="gmail_quote">From: <b class="gmail_sendername">Canadian
Wilf</b> <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:canwilf@gmail.com">canwilf@gmail.com</a>></span><br>
Date: Thu, Sep 6, 2012 at 9:52 PM<br>
Subject: XML 1.1 -- Please stab me with a dull knife and
trample my dead body<br>
To: <a moz-do-not-send="true"
href="mailto:jdom-interest@jdom.org">jdom-interest@jdom.org</a><br>
<br>
<br>
<div>Hi All,</div>
<div>
<br>
</div>
<div>I just learned that in order to safely use JDOM2, I will
need to sanitize my Element .setText(string) so that the
parsed data does not contain verboten characters under the
XML 1.1 spec.</div>
<div><br>
</div>
<div>I have an ascii processor and it needs to be able to use
xml as a document format. Unfortunately, not all ascii is
allowed in an Element text.</div>
<div><br>
</div>
<div>Stab me with a dull knife and trample my dead body. But
..... please please please don't make me sanitize all my
data before putting it into XML Elements.</div>
<div><br>
</div>
<div>1) It makes my programming task much more cumbersome
because I must ensure not to feed any of the new verboten
and doomed ascii/UTF-8 characters to store as xml text.</div>
<br>
<div>2) No one uses xml 1.1, do they?</div>
<div><br>
</div>
<div>3) It slows down the parsing (a very small amount) with
all the element text checking.</div>
<div><br>
</div>
<div>Now that JDOM2 is xml 1.1 compatible, is there any
turning back. Can this be undone? </div>
<div><br>
</div>
<div>Does everyone understand that their software will bust if
data provided as text is not adhering to the new standard?</div>
<div><br>
</div>
<div>What about you? How do you deal with it when using the
libraries?</div>
<span class="HOEnZb"><font color="#888888">
<div><br>
</div>
<div>Wilf</div>
</font></span></div>
<br>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
To control your jdom-interest membership:
<a class="moz-txt-link-freetext" href="http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com">http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com</a></pre>
</blockquote>
<br>
</body>
</html>