[jdom-interest] d-o-e, rendering embedded HTML and XMLOutputter
Gary Lawrence Murphy
garym at canada.com
Sun Apr 14 14:10:15 PDT 2002
This problem has taken me across three mailing lists, and I'm not
getting any closer ... but I do have some evidence it's a problem
with the way I am using transformation on JDOM objects.
Here's a scenario: Users enter free-hand (often broken) HTML into a
webform textarea edit box. Their text is wrapped in an XML envelope,
their HTML enclosed by <![CDATA[ ]]> to escape all the <>& chars and
so the broken markup will parse. This envelope XML document is
stored, transported, retrieved and unpacked so the original
user-entered HTML can be displayed on a weblog. This is more or less
the scenario for "disable-output-escaping".
The XML gets transformed using Xalan:
JDOMResult resultdom = new JDOMResult();
Transformer transformer = TransformerFactory.newInstance()
transformer.transform(new JDOMSource(doc), resultdom);
Document newDoc = resultdom.getDocument();
In pure Java (JDOM/Jaxen/XPath), I can xpath the text node from the
transformation result, print its contents to the servlet output stream
and the result will be the HTML I expect to see and renders correctly
in the browser:
print( (new XPath( "//highlight" )).selectNodes( newDoc ).getText());
and if I use Xalan (1.2.2) from the command line I also get exactly what
I expect to get, the HTML rendered as HTML:
java org.apache.xalan.xslt.Process -in preview.xml -xsl preview.xsl
But when I use XMLOutputter on newDoc, the <>& chars remain escaped;
the originally escaped html appears as codes, but all the other XSL-introduced
markup renders correctly -- it is as if the Transformer had d-o-e disabled,
except that the XPath test proves that it did not.
XMLOutputter xml = new XMLOutputter("\t", true);
xml.output((Document) getDocument(), out);
Is it the setTextNormalize(true) or some unspecified option that is causing
the <xsl:value-of select="//highlight"/> contents to be re-escaped?
Any guesses what might be going on here? Even wild guesses are welcome.
Here's the sample data and the problem transform:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE sports-content SYSTEM "dtds/sportsml-core.dtd">
<sports-metadata date-time=" April 12, 2002, at 10:52 AM ET " language="en-US" fixture-key="event-preview" fixture-source="iptc.org" fixture-name="Event Preview">
<sports-title>Recap: Phoenix vs. Minnesota</sports-title>
<event-metadata event-key="82900" site-key="" site-name="" site-source="iptc.org" event-status="pre-event" />
(Sports Network) - The Phoenix Coyotes hope to wrap up a playoff spot tonight
when they welcome the Minnesota Wild to America West Arena.
With 91 points, the Coyotes are eighth in the Western Conference -- one ahead
of the Edmonton Oilers and one behind the Vancouver Canucks. Should they top
the Wild this evening and the Oilers fall in regulation to Calgary, the
Coyotes would end their one-year playoff drought.
and the simplest possible XSL
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" >
does not matter if doe is set on or off in the value-of, the output
result is identical, the browser sees a single block of text with
literal "<P>" strings in it instead of paragraph breaks.
Gary Lawrence Murphy <garym at teledyn.com> TeleDynamics Communications Inc
Business Innovations Through Open Source Systems: http://www.teledyn.com
"Computers are useless. They can only give you answers."(Pablo Picasso)
More information about the jdom-interest