[jdom-interest] don't validate comments

Grinvald, Edward Edward.Grinvald at ca.com
Thu Dec 5 08:04:01 PST 2002

If the comments in the html are of no concern to you, then you might
want to do a little preprocessing and simply get them out of the
document before you parse.

= eg

-----Original Message-----
From: Elliotte Rusty Harold [mailto:elharo at metalab.unc.edu] 
Sent: Thursday, December 05, 2002 10:53 AM
To: Christian Peter
Cc: jdom-interest at jdom.org
Subject: Re: [jdom-interest] don't validate comments

At 4:05 PM +0100 12/5/02, Christian Peter wrote:

>Well, you are right that I don't quite know about the difference
>between validation and well-formedness check (I thought the latter 
>is part of the first).

Well-formedness is a prerequisite for validity, but it is not the 
same thing. A document can be invalid but still well-formed.

>However, I think it should be possible to take a HTML document with
>some incorrect comment content and extract the content of the 
>document, ignoring the comments. Isn't it the content of the 
>document which is of interest, not the comments? And as you can see, 
>even such official governmental sites have non-valid HTML comments.
>In my opinion we should provide the option not to regard the 
>comment's content. Don't you agree?

No. I don't. If it's not well-formed it isn't an XML document, 
period. In a malformed document there is no way to tell what is and 
is not a comment. All well-formedness rules must be adhere to without 
exception. Short of that you don't have an XML document.

| Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
|          XML in a  Nutshell, 2nd Edition (O'Reilly, 2002)          |
|              http://www.cafeconleche.org/books/xian2/              |
|  http://www.amazon.com/exec/obidos/ISBN%3D0596002920/cafeaulaitA/  |
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
|  Read Cafe con Leche for XML News: http://www.cafeconleche.org/    |
To control your jdom-interest membership:

More information about the jdom-interest mailing list