[jdom-interest] setIgnoringElementContentWhitespace inoperant ?

Wed Dec 8 15:47:16 PST 2004

Eric VERGNAUD wrote:
> Hi,
> 
> I'm using jdom 1.0. I have the following code:
> 
>    public static Document ReadDocument(File inFile)
>         throws JDOMException, IOException
>     {
>         SAXBuilder sax = new SAXBuilder();
>          sax.setIgnoringElementContentWhitespace(true);
>          return sax.build(inFile);
>      }
> 
> I use this code to parse a document that has been serialized in pretty
> format. There are plenty of 0x0D 0x0A and 0x20 between the elements.
> I was hoping sax.setIgnoringElementContentWhitespace would clean that up,
> but it's not.
> 
> Am I missing something ?

Yes!

First, this feature depends on validation being turned on. Second,
it does not apply to mixed-mode content. Let's see if I can find
the stuff in the spec ....

Yes, here it is: http://www.w3.org/TR/2004/REC-xml-20040204/#sec-element-content

<wishful-thinking>
It would've been nice if there was a special whitespace character set that
could only be used for indentation and line-breaking purposes on
output; a character set that would always be skipped on input.
</wishful-thinking>

/pmn