Non well-formedness checking parser (was Re: [jdom-interest] don't validate comments)
Robert (Jamie) Munro
rjmunro at arjam.net
Thu Dec 5 08:59:09 PST 2002
Sorry, I misposted something to Elliotte that I meant for the list. Here it
is with his & my followups:
----- Original Message -----
From: "Elliotte Rusty Harold" <elharo at metalab.unc.edu>
To: "Robert (Jamie) Munro" <rjmunro at arjam.net>
Sent: Thursday, December 05, 2002 3:50 PM
Subject: Re: [jdom-interest] don't validate comments
> At 2:01 PM +0000 12/5/02, Robert (Jamie) Munro wrote:
> >At what level is this well-formedness checking happening? Could it be
> >ignored (after setting something to say ignore this type of error), yet
> >still try to build a JDOM object, so that when to XMLOutput it, you get a
> >corrected document? It could be useful to have an XML correction tool.
> >HTML-Tidy does something similar with HTML and XHTML.
> No, it cannot be ignored. To do so would be completely non-conformant
> to the XML spec. JDOM is right. The document is wrong. The document
> needs to be fixed. YOu might be able to fix this in a preprocess
> using something like HTML Tidy before passing the stream to JDOM as
> you suggest, but nothing in JDOM needs to change.
How can ResultSetBuilder exist then? That can't possibly be compliant with
the XML spec. What I am suggesting is, in effect, an alternative parser.
It's just that it may be so similar to the existing parser that it may as
well be the exising parser with a method added (something like
"setWellFormedErrorsIgnore()"). People using this method (or whole new
parser codebase if that's neccesary) would know that what comes out isn't
neccesarily exactly what goes in because of errors. To be really clear about
it, it could add comments of its own everywhere to make sure people notice
that they shouldn't be using it.
It just seems that having something like HTML-Tidy build it's own object
model, then serialise it all again, only to be made into a new object model
by JDOM is a waste of time.
It just seems on this list that people often want to import malformed XML.
More information about the jdom-interest