[jdom-interest] XML escaping and unescaping

David Wall d.wall at computer.org
Fri Nov 19 18:24:12 PST 2004


Very cool.  I'll give it a try!

David

----- Original Message ----- 
From: "Jason Hunter" <jhunter at xquery.com>
To: <d.wall at computer.org>
Cc: <jdom-interest at jdom.org>
Sent: Friday, November 19, 2004 4:58 PM
Subject: Re: [jdom-interest] XML escaping and unescaping


> When you call elt.getText() you get the decoded (semantic) form.  Think
> of JDOM as representing the XML infoset and the &quot; or CDATA
> representation as just one way to encode the XML data when written as a
> stream of bytes.  If you call elt.setText("This \"is\" a test") the
> outputter will write what you have below.
>
> In other words, it's not part of standard class libs since it's almost
> never needed by normal programmers.  JDOM via the parsers handles the
> input and JDOM via XMLOutputter handles the output.  You just deal with
> plain old strings and you don't mind which chars are special and which
> aren't.
>
> -jh-
>
> d.wall at computer.org wrote:
>
> > Thanks.  I'll take a look at your escapers and compare.  It's a bit
> > amazing that such functionality isn't just part of the standard class
> > libraries by now.
> >
> > As for coming back in, an XML parser won't decode a string for you, will
> > it?  I mean, if my XML looks like:
> >
> > <data>
> > <field>This &quot;is&quot; a test.</field>
> > </data>
> >
> > I would expect that getting the data->field text value would return:
> >      This &quot;is&quot; a test.
> >
> > Are you saying some XML parsers will return instead:
> >      This "is" a test.
> >
> > My impression is that such an encoded element would return the String
> > still encoded.
> >
> > David
> >
> >
> > Jason Hunter wrote:
> >
> >> XMLOutputter has escapeElementEntities() and escapeAttributeEntities()
> >> that do what you want and have a pluggaable EscapeStrategy to handle
> >> characters outside the selected output encoding.  We don't have code
> >> to do the reverse as we rely on XML parsers for that.
> >>
> >> -jh-
> >>
> >> d.wall at computer.org wrote:
> >>
> >>> Does JDOM come with any utility routines that will take a String and
> >>> make it XML safe?  And also a routine that takes an XML safe encoding
> >>> and converts it back to a regular String?
> >>>
> >>> i.e.
> >>>
> >>> String -> XML Safe string -> String
> >>>
> >>> "This" -> "This"  -> "This"  (no change needed)
> >>> "4+3<4+4" -> "4+3&lt;4+4" -> "4+3<4+4"
> >>>
> >>> I only ask because I have some basic routines that do this, but they
> >>> only map the following:
> >>>
> >>>  >   &gt;
> >>> <   &lt;
> >>> &   &amp;
> >>> '     &apos;
> >>> "    &quot;
> >>>
> >>> It currently doesn't deal with escaped character codes like &#039; It
> >>> seems that putting data into XML and getting it back from XML is so
> >>> common that there must be a general routine to do this rather than
> >>> having to rely on my own implementation.
> >>>
> >>> Thanks,
> >>> David
> >>>
> >>> _______________________________________________
> >>> To control your jdom-interest membership:
> >>>
http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
> >>>
> >>
> >
> > _______________________________________________
> > To control your jdom-interest membership:
> > http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
> >
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com



More information about the jdom-interest mailing list