[jdom-interest] don't validate comments
cpeter at rostock.igd.fhg.de
Thu Dec 5 07:05:37 PST 2002
Elliotte Rusty Harold wrote:
> At 11:36 AM +0100 12/5/02, Christian Peter wrote:
>> I get a org.jdom.IllegalDataException telling me that "Comments cannot
>> contain double hyphens (--)", which doubtlessly is true
>> (if you are interested, http://www.nasa.org causes the exception).
>> However, I need to parse this document and since I'm not interested in
>> the comments, I would like JDOM to simply ignore the content of a
>> comment. I thought I can achieve this by setting
>> DOMBuilder.setValidation to false, but I still get this Exception.
> You are confusing validation with well-formedness checking. Validation
> is optional. Well-formedness checking is not. The document you are
> trying to parse is malformed. JDOM correctly refuses to work with it.
> You must fix the document before JDOM or any other XML tool will accept it.
Well, you are right that I don't quite know about the difference
between validation and well-formedness check (I thought the latter
is part of the first).
However, I think it should be possible to take a HTML document with
some incorrect comment content and extract the content of the
document, ignoring the comments. Isn't it the content of the
document which is of interest, not the comments? And as you can see,
even such official governmental sites have non-valid HTML comments.
In my opinion we should provide the option not to regard the
comment's content. Don't you agree?
More information about the jdom-interest