[jdom-interest] don't validate comments

Christian Peter cpeter at rostock.igd.fhg.de
Thu Dec 5 07:05:37 PST 2002


Elliotte Rusty Harold wrote:
> At 11:36 AM +0100 12/5/02, Christian Peter wrote:
> 
>> Hi,
>>
>> I get a org.jdom.IllegalDataException telling me that "Comments cannot 
>> contain double hyphens (--)", which doubtlessly is true
>> (if you are interested, http://www.nasa.org causes the exception).
>>
>> However, I need to parse this document and since I'm not interested in 
>> the comments, I would like JDOM to simply ignore the content of a 
>> comment. I thought I can achieve this by setting 
>> DOMBuilder.setValidation to false, but I still get this Exception.
>>
> 
> You are confusing validation with well-formedness checking. Validation 
> is optional. Well-formedness checking is not. The document you are 
> trying to parse is malformed. JDOM correctly refuses to work with it. 
> You must fix the document before JDOM or any other XML tool will accept it.

Well, you are right that I don't quite know about the difference 
between validation and well-formedness check (I thought the latter 
is part of the first).
However, I think it should be possible to take a HTML document with 
some incorrect comment content and extract the content of the 
document, ignoring the comments. Isn't it the content of the 
document which is of interest, not the comments? And as you can see, 
even such official governmental sites have non-valid HTML comments.
In my opinion we should provide the option not to regard the 
comment's content. Don't you agree?

Christian




More information about the jdom-interest mailing list