[jdom-interest] Need to optionally cancel automatic escaping

Bradley S. Huffman hip at cs.okstate.edu
Thu Jul 17 07:30:03 PDT 2003


If

    Text text = new Text("The rain in Spain,\nfalls mainly on the plain");

and the line seperator is "\r\n", then XMLOutputter will convert "\n"
to "\r\n" as in

    The rain in Spain,\r\nfalls mainly on the plain

Since JDOM already scans for < and & this will cost almost nothing and maybe
help applications that rely on a specific line terminator.

However, if the user does

    Text text = new Text("The rain in Spain,\r\nfalls mainly on the plain");

and the line seperator is "\r\n", then XMLOutputter will output

    The rain in Spain,\r\r\nfalls mainly on the plain

so on the return trip through a XML paser the original line

    The rain in Spain,\r\nfalls mainly on the plain

is built.

Brad

"Alex Rosen" writes:

> Sounds fine to me.
> 
> I didn't understand #3 though.
> 
> Alex
> 
> 
> >>> "Bradley S. Huffman" <hip at cs.okstate.edu> 7/11/2003 5:23:33 PM >>>
> Perfect timing. A while back James Clark posted on the xml-dev mailing
> list.
> 
>     If your infoset contains a carriage return, you have to output
>     it as a numeric character reference, otherwise line-end
>     normalization will turn it into a line-feed. Similarly, if
>     attribute values in the infoset contain line-feeds or tabs, they
>     need to be output as numeric character references, otherwise
>     attribute value normalization will turn them into spaces...When
>     I'm creating XML, some parts of what I am creating may well have
>     come from parsing an XML document.  That means if there's any
>     XML infoset that my program cannot serialize correctly, it's
>     potentially a bug.
> 
> To which Elliotte Rusty Harold asked on his XOM mail-list (XOM's
> Serializer
> and JDOM's XMLOutputter are similar so issues affecting one usually
> affect
> the other).
> 
>     I don't think the XOM serializer bothers to escape such carriage 
>     returns, line feeds, tabs and the like where Clark suggests it 
>     should. Should it? Or should this at least be an option in the 
>     Serializer? And if it is an option, should it be the default
> option? 
>     Thoughts?
> 
> Which lead to a two day thread about what, if anything, should be done
> about
> carriage returns, line feeds, and tabs in attribute values and text
> content.
> 
> To which John Cowan came up with the following algorithm.
> 
>     In that case, the default mode should:
> 
>     1) Escape all \r characters;
>     2) Escape \t and \n characters in attribute values;
>     3) Output \n characters in character content as the line
> terminator;
>     4) Escape all non-encodable characters;
>     5) Encode everything else.
> 
>     Doing anything else will not preserve the infoset through a round
> trip.
> 
> #1-#3 would be fairly easy to do in XMLOutputer since we already escape
> & and
> >. #4 and #5 I think are already handled by the default escape
> strategy, but
> I haven't looked deep enough to give a definitive answer. This would
> provide
> for roundtripping by default in the two cases of
> 
>     text -> SAXBuilder -> JDOM tree -> XMLOutputter -> text
>     JDOM tree -> XMLOutputter -> text -> SAXBuilder -> JDOM tree
> 
> which currently JDOM doesn't do.
> 
> Thoughts?
> 
> Brad
> _______________________________________________
> To control your jdom-interest membership:
> http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost
> .com
> _______________________________________________
> To control your jdom-interest membership:
> http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost
> .com



More information about the jdom-interest mailing list