[jdom-interest] Cannot close an XML file used for parsing
cowtowncoder at yahoo.com
Tue Oct 28 09:29:17 PDT 2008
--- On Tue, 10/28/08, Jack Bush <netbeansfan at yahoo.com.au> wrote:
> From: Jack Bush <netbeansfan at yahoo.com.au>
> Subject: [jdom-interest] Cannot close an XML file used for parsing
> To: jdom-interest at jdom.org
> Date: Tuesday, October 28, 2008, 7:03 AM
> Hi All,
> I appears to have difficulty closing (possibly flushing it
> first) an XML file that was subsequently being parsed
> without success. The error generated is:
> org.jdom.input.JDOMParseException: Error on line 23: The
> element type "form" must be terminated by the
> matching end-tag "</form>".
> Below is the code snippets of readData() to retrieve (HTML)
> data from a website, save it to a file, then convert to XML
> format before returning the new filename:
But xml parsers do not convert html -- either content is well-formed xml, or it is not. Based on error message it looks like it is not (in html you can omit all kinds of things without problems, not so in xml).
If you need to process html what you need to do is to use an html parser that can expose content as if it was xml. My favorite is TagSoup but there are many other alternatives like JTidy and Neko.
After this step you can use JDOM for building tree model to process content.
Hope this helps,
-+ Tatu +-
More information about the jdom-interest