It sounds like you're trying to include binary data (converted to a string) in XML.  That won't work.  Try Base64 encoding the data first.<div>  (*Chris*)<br><br><div class="gmail_quote">On Fri, Sep 7, 2012 at 1:22 PM, Oliver Ruebenacker <span dir="ltr"><<a href="mailto:curoli@gmail.com" target="_blank">curoli@gmail.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">     Hello,<br>

<div class="im"><br>

On Fri, Sep 7, 2012 at 3:17 PM, Canadian Wilf <<a href="mailto:canwilf@gmail.com">canwilf@gmail.com</a>> wrote:<br>

> Let's focus on valid character data for xml. How to do this:<br>

><br>

> String s = someRandomBytesNowAsString();<br>

<br>

</div>  Java Strings are not actually random bytes. The bytes are UTF-16, if<br>

I remember correctly.<br>

<div class="im"><br>

> Element e = new Element("random")<br>

> e.setText(s) or e.addContent(new CDATA(s))<br>

><br>

> Currently this will fail.<br>

<br>

</div>  Sorry, you lost me here. How will this fail? Will it throw an<br>

exception? Or will it otherwise do something undesired?<br>

<br>

  Maybe I'm missing something, but it sounds to me as if you are<br>

referring to specs that apply to XML character streams and not to JDOM<br>

objects.<br>

<br>

     Take care<br>

     Oliver<br>

<div class="HOEnZb"><div class="h5"><br>

>.. Which seems wrong because I should be able to<br>

> send whatever data I want as text  in xml content.<br>

><br>

> What use is xml (1.0 or 1.1) if I cannot represent various data? Is the<br>

> solution to make a custom escaper for my data?<br>

><br>

> e.setText(encodeSpecial(s)) and decodeSpecial(e.getText())<br>

><br>

> Crazy!<br>

><br>

> Wilf<br>

><br>

><br>

> On Fri, Sep 7, 2012 at 11:48 AM, Rolf Lear <<a href="mailto:jdom@tuis.net">jdom@tuis.net</a>> wrote:<br>

>><br>

>><br>

>> Hi Wilf.<br>

>><br>

>> You are getting your wires crossed..... In your mail you referenced parsed<br>

>> and external entities. These have nothing to do with PCDATA (parsed<br>

>> character data - regular XML text), and CDATA (unparsed character data -<br>

>> <![CDATA[ ... ]]> )<br>

>><br>

>> Michael was answering your question based on the 'entities', where as you<br>

>> want the details on the 'PCDATA' and the 'CDATA'.<br>

>><br>

>> So, forget about the 'entity' references, and focus on the valid character<br>

>> data for XML.<br>

>><br>

>> The only difference between CDATA (character blocks between <![CDATA[  and<br>

>> ]]> ) and PCDATA (element 'text'), is that the XML Parser will look for<br>

>> '<' and '&' characters in PCDATA, but not in CDATA.<br>

>><br>

>> With the correct escaping, all CDATA content can be expressed as PCDATA<br>

>> content.<br>

>><br>

>> This does not help you though, because not all Java 'char' characters are<br>

>> valid Unicode characters, and thus not all chars are valid as either CDATA<br>

>> or PCDATA.<br>

>><br>

>> In XML 1.0 this distinction was clear.<br>

>><br>

>> In XML 1.1 I am not certain how to interpret the difference between<br>

>> 'Chars' and 'RestrictedChars': <a href="http://www.w3.org/TR/xml11/#charsets" target="_blank">http://www.w3.org/TR/xml11/#charsets</a><br>

>><br>

>> JDOM takes a 1.0 perspective on Characters... which may be a problem, but<br>

>> it is not going to solve your issues even if it supports 1.1 chars.<br>

>><br>

>> Rolf<br>

>><br>

>><br>

>><br>

>><br>

>> On Fri, 7 Sep 2012 08:45:33 -0700, Canadian Wilf <<a href="mailto:canwilf@gmail.com">canwilf@gmail.com</a>><br>

>> wrote:<br>

>> > Then what is the proper mode:<br>

>> ><br>

>> > Element e = new Element("foo")<br>

>> ><br>

>> > Should I do this:<br>

>> ><br>

>> > e.setText(string_of_sanitized_data_with_illegal_characters_escaped);<br>

>> ><br>

>> > or<br>

>> ><br>

>> > e.setText(any_text);<br>

>> ><br>

>> ><br>

>> > Wilf<br>

>> ><br>

>> ><br>

>> > On Fri, Sep 7, 2012 at 6:05 AM, Michael Kay <<a href="mailto:mike@saxonica.com">mike@saxonica.com</a>> wrote:<br>

>> ><br>

>> >>  No, that's all wrong. The contents of an unparsed entity are always an<br>

>> >> external resource, they are never part of a text or attribute node.<br>

>> >> Parsed<br>

>> >> entities do become part of the content, but they must always use the<br>

>> XML<br>

>> >> character set.<br>

>> >><br>

>> >> Michael Kay<br>

>> >> Saxonica<br>

>> >><br>

>> >> On 07/09/2012 13:10, Canadian Wilf wrote:<br>

>> >><br>

>> >> According to the xml 1.1 spec:<br>

>> >><br>

>> >>  4 Physical Structures ...<br>

>> >>> [Definition: An *unparsed entity* is a resource whose contents may or<br>

>> >>> may not be text <<a href="http://www.w3.org/TR/xml11/#dt-text" target="_blank">http://www.w3.org/TR/xml11/#dt-text</a>>, and if text,<br>

>> may<br>

>> >>> be other than XML. Each unparsed entity has an associated<br>

>> >>> notation<<a href="http://www.w3.org/TR/xml11/#dt-notation" target="_blank">http://www.w3.org/TR/xml11/#dt-notation</a>>,<br>

>> >>> identified by name. Beyond a requirement that an XML processor make<br>

>> the<br>

>> >>> identifiers for the entity and notation available to the application,<br>

>> >>> XML<br>

>> >>> places no constraints on the contents of unparsed entities.]<br>

>> >><br>

>> >><br>

>> >><br>

>> >>  AND<br>

>> >><br>

>> >>  Entities may be either parsed or unparsed. [Definition: The contents<br>

>> of<br>

>> >>> a *parsed entity* are referred to as its replacement<br>

>> >>> text<<a href="http://www.w3.org/TR/xml11/#dt-repltext" target="_blank">http://www.w3.org/TR/xml11/#dt-repltext</a>>;<br>

>> >>> this text <<a href="http://www.w3.org/TR/xml11/#dt-text" target="_blank">http://www.w3.org/TR/xml11/#dt-text</a>> is considered an<br>

>> >>> integral part of the document.]<br>

>> >><br>

>> >> [Definition: An *unparsed entity* is a resource whose contents may or<br>

>> may<br>

>> >>> not be text <<a href="http://www.w3.org/TR/xml11/#dt-text" target="_blank">http://www.w3.org/TR/xml11/#dt-text</a>>, and if text, may be<br>

>> >>> other than XML. Each unparsed entity has an associated<br>

>> >>> notation<<a href="http://www.w3.org/TR/xml11/#dt-notation" target="_blank">http://www.w3.org/TR/xml11/#dt-notation</a>>,<br>

>> >>> identified by name. Beyond a requirement that an XML processor make<br>

>> the<br>

>> >>> identifiers for the entity and notation available to the application,<br>

>> >>> XML<br>

>> >>> places no constraints on the contents of unparsed entities.]<br>

>> >>> Parsed entities are invoked by name using entity references; unparsed<br>

>> >>> entities by name, given in the value of *ENTITY* or *ENTITIES*<br>

>> >>>  attributes.<br>

>> >><br>

>> >><br>

>> >><br>

>> >>  In the current JDOM version, Element method setText(string) and also<br>

>> >> addContent(CDATA) refuses text that contains illegal characters. It is<br>

>> >> treating the data provided as 'parsed' when it should by the spec be<br>

>> >> treating it as free content.<br>

>> >><br>

>> >>  I understand:<br>

>> >><br>

>> >>   1) The xml 1.1 spec defines a parsed entity as its 'replacement<br>

>> text'.<br>

>> >><br>

>> >>  2) Replacement text' would refer to the actual textual makeup of a<br>

>> >> serialized Element, not the data an Element holds in a Text content<br>

>> >> element<br>

>> >><br>

>> >><br>

>> >>  Then, if the above is true, the current implementation is actually<br>

>> wrong<br>

>> >> to verify data.<br>

>> >><br>

>> >>  I propose that JDOM stop verifying data set as Element text and CDATA<br>

>> >> and leave it to the xerces (or whatever) to make sure the document is<br>

>> >> proper 1.1.<br>

>> >><br>

>> >>  Am I understanding everything correctly?<br>

>> >><br>

>> >>  Thoughts?<br>

>> >><br>

>> >>  ---------- Forwarded message ----------<br>

>> >> From: Canadian Wilf <<a href="mailto:canwilf@gmail.com">canwilf@gmail.com</a>><br>

>> >> Date: Thu, Sep 6, 2012 at 9:52 PM<br>

>> >> Subject: XML 1.1 -- Please stab me with a dull knife and trample my<br>

>> dead<br>

>> >> body<br>

>> >> To: <a href="mailto:jdom-interest@jdom.org">jdom-interest@jdom.org</a><br>

>> >><br>

>> >><br>

>> >> Hi All,<br>

>> >><br>

>> >>  I just learned that in order to safely use JDOM2, I will need to<br>

>> >> sanitize my Element .setText(string) so that the parsed data does not<br>

>> >> contain verboten characters under the XML 1.1 spec.<br>

>> >><br>

>> >>  I have an ascii processor and it needs to be able to use xml as a<br>

>> >> document format. Unfortunately, not all ascii is allowed in an Element<br>

>> >> text.<br>

>> >><br>

>> >>  Stab me with a dull knife and trample my dead body. But ..... please<br>

>> >> please please don't make me sanitize all my data before putting it into<br>

>> >> XML<br>

>> >> Elements.<br>

>> >><br>

>> >>  1) It makes my programming task much more cumbersome because I must<br>

>> >> ensure not to feed any of the new verboten and doomed ascii/UTF-8<br>

>> >> characters to store as xml text.<br>

>> >><br>

>> >> 2) No one uses xml 1.1, do they?<br>

>> >><br>

>> >>  3) It slows down the parsing (a very small amount) with all the<br>

>> element<br>

>> >> text checking.<br>

>> >><br>

>> >>  Now that JDOM2 is xml 1.1 compatible, is there any turning back. Can<br>

>> >> this be undone?<br>

>> >><br>

>> >>  Does everyone understand that their software will bust if data<br>

>> provided<br>

>> >> as text is not adhering to the new standard?<br>

>> >><br>

>> >>  What about you? How do you deal with it when using the libraries?<br>

>> >><br>

>> >>  Wilf<br>

>> >><br>

>> >><br>

>> >><br>

>> >> _______________________________________________<br>

>> >> To control your jdom-interest<br>

>> >><br>

>><br>

>> membership:<a href="http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com" target="_blank">http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com</a><br>

>> >><br>

>> >><br>

>> >><br>

>> >> _______________________________________________<br>

>> >> To control your jdom-interest membership:<br>

>> >> <a href="http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com" target="_blank">http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com</a><br>

>> >><br>

><br>

><br>

><br>

> _______________________________________________<br>

> To control your jdom-interest membership:<br>

> <a href="http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com" target="_blank">http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com</a><br>

<br>

<br>

<br>

</div></div><span class="HOEnZb"><font color="#888888">--<br>

Scientific Developer at PanGenX (<a href="http://www.pangenx.com" target="_blank">http://www.pangenx.com</a>)<br>

<br>

"Stagnation and the search for truth are always opposites." - Nadezhda<br>

Tolokonnikova<br>

</font></span><div class="HOEnZb"><div class="h5">_______________________________________________<br>

To control your jdom-interest membership:<br>

<a href="http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com" target="_blank">http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com</a><br>

</div></div></blockquote></div><br></div>