[jdom-interest] non-ascii characters in xml document

Ian Lea ian.lea at blackwell.co.uk
Fri Nov 30 01:28:46 PST 2001


You might also like to look at the Javaworld article
"Java Tip 117: Transfer binary data in an XML document"
at http://www.javaworld.com/javaworld/javatips/jw-javatip117.html


--
Ian.
ian.lea at blackwell.co.uk


"John L. Webber - Jentro AG" wrote:
> 
> Dave,
> 
> This solution is pretty inelegant and may seem like overkill, but it
> works pretty well (as long as we're talking about attribute values or
> text content): try Base64-encoding the "suspect" strings before
> inserting them, and simply decode them when you need to use the text. We
> use that method frequently for handling things like encrypted passwords
> in files, and I've even sent rather large (7000+ lines) files completely
> Base64-encoded. The performance loss is small, as long as the operations
> are not too frequent.
> 
> Regards,
> 
> John
> 
> Dave Neuendorf wrote:
> >
> > To look at a simpler test case, I commented out my code that saves xml in gzip format,
> > and just used straight UTF-8 xml to and from a file. The "curly" single and double
> > quote characters give me exceptions like this:
> >
> >      [java] org.jdom.JDOMException: Error on line 1 of document
> > file:/C:/Development/Projects/HierarchicalPIM/default.xml: Character
> > conversion error: "Unconvertible UTF-8 character beginning with
> > 0x92" (line number may be too low).
> >      [java]     at org.jdom.input.SAXBuilder.build(SAXBuilder.java:296)
> >
> > It sees the single and double quote chars as 0x92 and 0x93, respectively. Maybe these
> > characters aren't Unicode. Could they be Windows-specific character codes, since the
> > text is being pasted from a Windows application into a Java app?



More information about the jdom-interest mailing list