If the comments in the html are of no concern to you, then you might
want to do a little preprocessing and simply get them out of the
document before you parse.

= eg

>Well, you are right that I don't quite know about the difference
>between validation and well-formedness check (I thought the latter 
>is part of the first).

Well-formedness is a prerequisite for validity, but it is not the 
same thing. A document can be invalid but still well-formed.

>However, I think it should be possible to take a HTML document with
>some incorrect comment content and extract the content of the 
>document, ignoring the comments. Isn't it the content of the 
>document which is of interest, not the comments? And as you can see, 
>even such official governmental sites have non-valid HTML comments.
>In my opinion we should provide the option not to regard the 
>comment's content. Don't you agree?

No. I don't. If it's not well-formed it isn't an XML document, 
period. In a malformed document there is no way to tell what is and 
is not a comment. All well-formedness rules must be adhere to without 
exception. Short of that you don't have an XML document.

