[jdom-interest] Need to optionally cancel automatic escaping

Alex Rosen arosen at novell.com
Thu Jul 17 06:52:35 PDT 2003


Sounds fine to me.

I didn't understand #3 though.

Alex


>>> "Bradley S. Huffman" <hip at cs.okstate.edu> 7/11/2003 5:23:33 PM >>>
Perfect timing. A while back James Clark posted on the xml-dev mailing
list.

    If your infoset contains a carriage return, you have to output
    it as a numeric character reference, otherwise line-end
    normalization will turn it into a line-feed. Similarly, if
    attribute values in the infoset contain line-feeds or tabs, they
    need to be output as numeric character references, otherwise
    attribute value normalization will turn them into spaces...When
    I'm creating XML, some parts of what I am creating may well have
    come from parsing an XML document.  That means if there's any
    XML infoset that my program cannot serialize correctly, it's
    potentially a bug.

To which Elliotte Rusty Harold asked on his XOM mail-list (XOM's
Serializer
and JDOM's XMLOutputter are similar so issues affecting one usually
affect
the other).

    I don't think the XOM serializer bothers to escape such carriage 
    returns, line feeds, tabs and the like where Clark suggests it 
    should. Should it? Or should this at least be an option in the 
    Serializer? And if it is an option, should it be the default
option? 
    Thoughts?

Which lead to a two day thread about what, if anything, should be done
about
carriage returns, line feeds, and tabs in attribute values and text
content.

To which John Cowan came up with the following algorithm.

    In that case, the default mode should:

    1) Escape all \r characters;
    2) Escape \t and \n characters in attribute values;
    3) Output \n characters in character content as the line
terminator;
    4) Escape all non-encodable characters;
    5) Encode everything else.

    Doing anything else will not preserve the infoset through a round
trip.

#1-#3 would be fairly easy to do in XMLOutputer since we already escape
& and
>. #4 and #5 I think are already handled by the default escape
strategy, but
I haven't looked deep enough to give a definitive answer. This would
provide
for roundtripping by default in the two cases of

    text -> SAXBuilder -> JDOM tree -> XMLOutputter -> text
    JDOM tree -> XMLOutputter -> text -> SAXBuilder -> JDOM tree

which currently JDOM doesn't do.

Thoughts?

Brad
_______________________________________________
To control your jdom-interest membership:
http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com



More information about the jdom-interest mailing list