[jdom-interest] Re: XmlOutputter : escaping of \r in element text

Thomas Fischer Fischer at seitenbau.net
Wed Oct 27 01:26:30 PDT 2004







Hi,

Thank you very much for your replies.

In my specific application, the problem is the text element does not come
from an xml parser. So of course I can parse the text and remove any \r's
in the text,
and probably this is what I should do, but it makes the use of jdom
slightly more difficult (I will have to subclass Element and override
Element.setText()). I just wondered whether there was an easier way.

By the way, I did search the archives before writing to the list, but
obviously not hard enough. For anyone who looks for the old discussion, it
can be found at
http://www.servlets.com/archive/servlet/ReadMsg?msgId=487722&listName=jdom-interest

But still, not considering my specific problem:
As far as I can see, there are two competing interests :
1) one wants to do one complete roundtrip (outputting, re-parsing) and
still have the same document
2) one wants to be able to produce any valid document using jdom (including
text elements containing literal \n and \t characters)

With jdom 1.0, 1) is achieved, but 2) isn't
Please correct me if I am wrong, but both aims can be achieved by using the
'default escape strategy' proposed in my first mail.
Anyone who does not use an escape strategy wold not see any difference. I
do see that it breaks backwards compatibility people using a custom escape
strategy, but hey, \r and \t are interesting characters to control by an
escape strategy, don't you think ?

   Thomas

> We covered this really recently on the list, so go back to the archives
> if you're curious about background.  Short summary is since XML parsers
> are mandated to normalize incoming \r\n line endings to \n we know that
> your \r was specifically requested and thus we try to preserve it on
> output for you.  If you want line endings as \r\n, then use \n as a
> logical newline in your setText() call and set the XMLOutputter to use
> \r\n when it outputs line endings.
>
> -jh-
>
> Thomas Fischer wrote:
>
> >
> > Hi,
> >
> > I have a problem concerning the escaping of windows linefeeds in
element
> > texts in JDOM 1.0:
> > ...
> > Element element = new element("test");
> > element.setText("test\r\ntest");
> > ...
> > XMLOutputter outputter = new XMLOutputter();
> > outputter.output(document, System.out);
> >
> > The outputter prints the following
> > <test>test&#xD;
> > test</test>
> > while I just want to have
> > <test>test
> > test</test>
> >
> > I do not see how I could achieve this within jdom. Using EscapeStrategy
is
> > of no use, because it is not involved on the \r character (see jdom
> > sources, XmlOutputter.java, line 1455).
> >
> > In my opinion, this behaviour only makes sense for the special
characters
> > defined in the xml spec (<">'&), but not for additional entities,
> > because e.g. the character \r is a perfectly valid character within
> > a xml document (see the xml specification,
> > http://www.w3.org/TR/1998/REC-xml-1998021, chapter 2.2).
> >
> > A solution which would be 'backward compatible' in most cases would be
to
> > change the jdom source such that a "default escape strategy" is used
which
> > escapes the \r and \n character, but which can be overridden.
> >
> > What do you think ?
> >
> >    Thomas



More information about the jdom-interest mailing list