[jdom-interest] CDATA and XMLOutputter

rick beton richard.beton at dsl.pipex.com
Fri Apr 2 03:38:58 PST 2004


Hi All,

I have a question about outputting CDATA using XMLOutputter. (I'm using b10.)

In XMLOutputter, there is this method:

    protected void printCDATA(Writer out, CDATA cdata) throws IOException {
        String str = (currentFormat.mode == Format.TextMode.NORMALIZE)
                     ? cdata.getTextNormalize()
                     : ((currentFormat.mode == Format.TextMode.TRIM) ?
                             cdata.getText().trim() : cdata.getText());
        out.write("<![CDATA[");
        out.write(str);
        out.write("]]>");
    }

I'm no XML expert so please advise me why this formats the CDATA text. According
to my naive understanding of what a CDATA is, it's just 'stuff' and it shouldn't
be formatted or normalised at all.

If my understanding is right, then the method should be simplified to

    protected void printCDATA(Writer out, CDATA cdata) throws IOException {
        out.write("<![CDATA[");
        out.write(cdata.getText());
        out.write("]]>");
    }

I came across this because of trying to generate XHTML with Javascript nodes.
The Javascript may happen work when normalised. However, if it happens to
contain '//' comments (in which the whitespace line ending is significant), then
the normalisation could well break the Javascript.


On a more general question, I noticed that org.jdom.CDATA extends org.jdom.Text.  
What is the rationale for CDATA extending Text? 

Text represents XML 'character data' and there isn't really an 'is-a'
relationship between this and CDATA. Indeed, the spec
(http://www.w3.org/TR/2004/REC-xml-20040204/#syntax) specifically distinguishes
between CDATA and general (parsed) character data. Two relevant sentences are:

    Definition: Markup takes the form of start-tags, end-tags, empty-element
tags, entity references, character references, comments, CDATA section
delimiters, document type declarations, processing instructions, XML
declarations, text declarations, and any white space that is at the top level of
the document entity (that is, outside the document element and not inside any
other markup).

    Definition: All text that is not markup constitutes the character data of
the document.

Regards,
Rick :-)



More information about the jdom-interest mailing list