[jdom-interest] don't validate comments

Todd O'Bryan toddobryan at mac.com
Thu Dec 5 11:58:27 PST 2002


After being bitten in the butt by bad entity names (undefined entities  
are well-formedness violations, not validity violations), I can  
appreciate Christian's point, but I have to side with the XML community  
finally.

It's easy enough to write a subclass of BufferedReader that, as soon as  
it reads <!-- eats everything until it sees -->, and then pass an  
instance of that RemoveCommentReader to your XML parser. Now, it may be  
that certain things happen often enough (someone mentioned <p> and <br>  
in (X)HTML) that JDOM should have some Readers pre-built to handle  
those cases, but they definitely should be considered utility classes,  
not part of the core.

Todd

On Thursday, December 5, 2002, at 10:53  AM, Elliotte Rusty Harold  
wrote:

> At 4:05 PM +0100 12/5/02, Christian Peter wrote:
>
>
>> Well, you are right that I don't quite know about the difference  
>> between validation and well-formedness check (I thought the latter is  
>> part of the first).
>
> Well-formedness is a prerequisite for validity, but it is not the same  
> thing. A document can be invalid but still well-formed.
>
>> However, I think it should be possible to take a HTML document with  
>> some incorrect comment content and extract the content of the  
>> document, ignoring the comments. Isn't it the content of the document  
>> which is of interest, not the comments? And as you can see, even such  
>> official governmental sites have non-valid HTML comments.
>> In my opinion we should provide the option not to regard the  
>> comment's content. Don't you agree?
>
> No. I don't. If it's not well-formed it isn't an XML document, period.  
> In a malformed document there is no way to tell what is and is not a  
> comment. All well-formedness rules must be adhere to without  
> exception. Short of that you don't have an XML document.
> --  
>
> +-----------------------+------------------------+-------------------+
> | Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
> +-----------------------+------------------------+-------------------+
> |          XML in a  Nutshell, 2nd Edition (O'Reilly, 2002)          |
> |              http://www.cafeconleche.org/books/xian2/              |
> |  http://www.amazon.com/exec/obidos/ISBN%3D0596002920/cafeaulaitA/  |
> +----------------------------------+---------------------------------+
> |  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
> |  Read Cafe con Leche for XML News: http://www.cafeconleche.org/    |
> +----------------------------------+---------------------------------+
> _______________________________________________
> To control your jdom-interest membership:
> http://lists.denveronline.net/mailman/options/jdom-interest/ 
> youraddr at yourhost.com




More information about the jdom-interest mailing list