[jdom-interest] DOCTYPE still giving me the worst headache!
jason at jmlie.com
Tue Jan 29 17:33:15 PST 2002
I have been using JDOM for several months and have only one problem.
DOCTYPE declarations always give me many problems. Most of my applications
collect infromation from various we directores and sites and consolidate
them into nice xml files. My latest application collected 5,300+ web pages,
ran them through JTidy and then used XMLOutputter to write the results to
files on my system. I am now trying to parse them with JDOM so I can strip
the information I want and store it all in one XML file.
I am getting to following error:
org.jdom.JDOMException: Error on line 2 of document
file:/G:/www.che.com/companylistings/10.html: White space is required
between the public identifier and the system identifier.
This is the line JDOM is comlpaining about(I am using Beta 7).
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
JDOM had no problem writing this to disk. I cannot understand why it cannot
read it back again. I would appreciate any help with this matter. I have
posted this problem before and seen others postings as well, but I have yet
to see anything that will help me. In the past I used a regex to strip this
out of the file before sending to JDOM, but I do not like this approach at
Jason Long - President
Supernova Software - supernovasoftware.com
BS Physics, MS Chemical Engineering
From: jdom-interest-admin at jdom.org
[mailto:jdom-interest-admin at jdom.org]On Behalf Of Carsten Zerbst
Sent: Tuesday, November 27, 2001 9:52 AM
To: jdom-interest at jdom.org
Subject: [jdom-interest] Still struggling with doctype
Though tinkering around, i still struggle when reading files with
doctype declaration. How could I convince jdom to read them silently or
The doctypes in question are pretty valid and could be read with other
xml-tools, just jdom has the problems.
Dipl. Ing. Carsten Zerbst | carsten.zerbst at atlantec-es.com
To control your jdom-interest membership:
More information about the jdom-interest