[jdom-interest] Need to optionally cancel automatic escaping

Bradley S. Huffman hip at cs.okstate.edu
Fri Jul 11 14:23:33 PDT 2003


Perfect timing. A while back James Clark posted on the xml-dev mailing list.

    If your infoset contains a carriage return, you have to output
    it as a numeric character reference, otherwise line-end
    normalization will turn it into a line-feed. Similarly, if
    attribute values in the infoset contain line-feeds or tabs, they
    need to be output as numeric character references, otherwise
    attribute value normalization will turn them into spaces...When
    I'm creating XML, some parts of what I am creating may well have
    come from parsing an XML document.  That means if there's any
    XML infoset that my program cannot serialize correctly, it's
    potentially a bug.

To which Elliotte Rusty Harold asked on his XOM mail-list (XOM's Serializer
and JDOM's XMLOutputter are similar so issues affecting one usually affect
the other).

    I don't think the XOM serializer bothers to escape such carriage 
    returns, line feeds, tabs and the like where Clark suggests it 
    should. Should it? Or should this at least be an option in the 
    Serializer? And if it is an option, should it be the default option? 
    Thoughts?

Which lead to a two day thread about what, if anything, should be done about
carriage returns, line feeds, and tabs in attribute values and text content.

To which John Cowan came up with the following algorithm.

    In that case, the default mode should:

    1) Escape all \r characters;
    2) Escape \t and \n characters in attribute values;
    3) Output \n characters in character content as the line terminator;
    4) Escape all non-encodable characters;
    5) Encode everything else.

    Doing anything else will not preserve the infoset through a round trip.

#1-#3 would be fairly easy to do in XMLOutputer since we already escape & and
>. #4 and #5 I think are already handled by the default escape strategy, but
I haven't looked deep enough to give a definitive answer. This would provide
for roundtripping by default in the two cases of

    text -> SAXBuilder -> JDOM tree -> XMLOutputter -> text
    JDOM tree -> XMLOutputter -> text -> SAXBuilder -> JDOM tree

which currently JDOM doesn't do.

Thoughts?

Brad



More information about the jdom-interest mailing list