[jdom-interest] raw bytes turned into string and inserted into xml

Elliotte Rusty Harold elharo at metalab.unc.edu
Mon Mar 18 12:20:57 PST 2002

At 4:47 PM +0200 3/18/02, Jeff Singer wrote:
>Hi all,
>I have a situation in which my applications input is a raw stream of
>bytes, these bytes are actually ascii strings which occassionaly will
>contain characters which are illegal to xml like 0x4, 0x0, 0x1 - quite a
>few in this range. I use the string constructor which takes an encoding
>to form them into java.lang.String objects. This works fine, I then
>insert them as content onto a JDOM Element object and eventually after

Aha! You've just demonstrated one case in which JDOM does need to be 
verifying the character data content. From the moment you inserted 
the first 0x4, 0x0, etc. your document was no longer well-formed XML, 
and JDOM should have immediately thrown an exception to let you know 
this. I apologize that it didn't.

As to how to fix your code, that depends on what the strings are for 
and what you're ultimately doing with them. Some have suggested that 
you Base-64 encode your data. This is one possibility. Another is 
that you replace each illegal character by an an element like this:

<control value="4"/>

There are other solutions. However, the simple fact is you can't 
include these characters directly in an XML 1.0 document.

| Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
|          The XML Bible, 2nd Edition (Hungry Minds, 2001)           |
|             http://www.cafeconleche.org/books/bible2/              |
|   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
|  Read Cafe con Leche for XML News: http://www.cafeconleche.org/    |

More information about the jdom-interest mailing list