[jdom-interest] d-o-e, rendering embedded HTML and XMLOutputter

Gary Lawrence Murphy garym at canada.com
Sun Apr 14 14:10:15 PDT 2002


This problem has taken me across three mailing lists, and I'm not
getting any closer ... but I do have some evidence it's a problem
with the way I am using transformation on JDOM objects.

Here's a scenario: Users enter free-hand (often broken) HTML into a
webform textarea edit box. Their text is wrapped in an XML envelope,
their HTML enclosed by <![CDATA[ ]]> to escape all the <>& chars and
so the broken markup will parse.  This envelope XML document is
stored, transported, retrieved and unpacked so the original
user-entered HTML can be displayed on a weblog.  This is more or less
the scenario for "disable-output-escaping".

The XML gets transformed using Xalan:

    JDOMResult resultdom = new JDOMResult();
    Transformer transformer = TransformerFactory.newInstance()
                        .newTransformer(new StreamSource(xsl));

    transformer.transform(new JDOMSource(doc), resultdom);
    Document newDoc = resultdom.getDocument();

In pure Java (JDOM/Jaxen/XPath), I can xpath the text node from the
transformation result, print its contents to the servlet output stream
and the result will be the HTML I expect to see and renders correctly
in the browser:

     print( (new XPath( "//highlight" )).selectNodes( newDoc ).getText());

and if I use Xalan (1.2.2) from the command line I also get exactly what
I expect to get, the HTML rendered as HTML:

java org.apache.xalan.xslt.Process -in preview.xml -xsl preview.xsl

But when I use XMLOutputter on newDoc, the <>& chars remain escaped;
the originally escaped html appears as codes, but all the other XSL-introduced
markup renders correctly -- it is as if the Transformer had d-o-e disabled,
except that the XPath test proves that it did not.

    XMLOutputter xml = new XMLOutputter("\t", true);
    xml.setTextNormalize(true);
    xml.output((Document) getDocument(), out);

Is it the setTextNormalize(true) or some unspecified option that is causing
the <xsl:value-of select="//highlight"/> contents to be re-escaped?

Any guesses what might be going on here?  Even wild guesses are welcome.
Here's the sample data and the problem transform:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE sports-content SYSTEM "dtds/sportsml-core.dtd">
<?cocoon-format type="text/xml"?>
<sports-content id="icehockey/NHL/82900/preview">
  <sports-metadata date-time=" April 12, 2002, at 10:52 AM ET " language="en-US" fixture-key="event-preview" fixture-source="iptc.org" fixture-name="Event Preview">
    <sports-title>Recap: Phoenix vs. Minnesota</sports-title>
  </sports-metadata>
  <sports-event>
    <event-metadata event-key="82900" site-key="" site-name="" site-source="iptc.org" event-status="pre-event" />
    <highlight class="snbody">

 (Sports  Network) - The Phoenix Coyotes hope to wrap up a playoff spot tonight
 when they welcome the Minnesota Wild to America West Arena.
 
&lt;P&gt;
 With  91 points, the Coyotes are eighth in the Western Conference -- one ahead
 of  the Edmonton  Oilers and one behind the Vancouver Canucks. Should they top
 the  Wild this  evening and  the  Oilers fall  in regulation  to Calgary,  the
 Coyotes would end their one-year playoff drought.
&lt;P&gt;
 

    </highlight>
  </sports-event>
</sports-content>


================================================================
and the simplest possible XSL

<?xml version="1.0" encoding="ISO-8859-1"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" >
<xsl:template match="/">

<h3><xsl:value-of select="sports-content/sports-metadata/sports-title"/></h3>
<div>
<xsl:value-of select="sports-content/sports-event/highlight"/>
</div>
</xsl:template>
</xsl:stylesheet>

================================================================ it
does not matter if doe is set on or off in the value-of, the output
result is identical, the browser sees a single block of text with
literal "<P>" strings in it instead of paragraph breaks.

-- 
Gary Lawrence Murphy <garym at teledyn.com> TeleDynamics Communications Inc
Business Innovations Through Open Source Systems: http://www.teledyn.com
"Computers are useless.  They can only give you answers."(Pablo Picasso)




More information about the jdom-interest mailing list