[jdom-interest] Patch - surrogate pair output support

Dave Byrne dbyrne at mdb.com
Mon Dec 10 06:20:22 PST 2007


I agree that the IOException should be a RuntimeException. I didn't see
anywhere else that unchecked exceptions are used and since the calling
methods already threw IOException I used that to try and be consistent.
If a bad surrogate pair is encountered it is most likely unrecoverable.
I'll change that to throw an IllegalDataException.

The tough thing about decoding the chars in a separate block is that you
are incrementing a looping primitive as well as returning the decoded
char. The cleanest solution I thought of was to actually wrap the String
parameter in an immutable class that overrides charAt(int) and length()
to take into account multi-byte chars. The downside is that as far as I
can tell it would really only be useful to XMLOutputter and less clear
(String vs JDOMString?). Also these multi-byte chars don't really fit
well into the api as they have to be represented as int's not char's, so
I am not sure how far the support for them should go.

What do you think?



-----Original Message-----
From: Jason Hunter [mailto:jhunter at servlets.com] 
Sent: Friday, December 07, 2007 10:13 PM
To: Dave Byrne
Cc: jdom-interest at jdom.org
Subject: Re: [jdom-interest] Patch - surrogate pair output support

Hi Dave,

This looks like really good code.  Thanks for the submission!

I'm worried about these two signature changes:

- public String escapeAttributeEntities(String str) {
+ public String escapeAttributeEntities(String str) throws IOException {

- public String escapeElementEntities(String str) {
+ public String escapeElementEntities(String str) throws IOException {

They're public methods.  If we add a new checked exception, it breaks 
drop-in backward compatibility.  Perhaps the IOException should be made 
into a runtime exception.  The internal calls to the methods in 
XMLOutputter could catch the runtime exception and convert it to an 
IOException so normal callers would see an IOException, but any direct 
callers we've accumulated over the years won't break because they'll see

the runtime.  Thoughts?

Also, do you think you could write a test case for this?  Something so 
if we ever break it later we'll know.  I'm asking you since you're more 
expert than I am on this topic.

Lastly, is there a reason you didn't break out the repeated code inside 
the shouldEscape() block into its own method?

Thanks again,
-jh-

Dave Byrne wrote:
> Attached is a patch to provide support for outputting documents with 
> XMLOutputter that contain non-BMP utf-16 characters as surrogate
pairs.
> 
> Patch is against cvs HEAD.




More information about the jdom-interest mailing list