[jdom-interest] Verbose XHTML 1.1 Doctype

Fri Mar 26 23:31:52 PST 2004

On Thu, 2004-03-25 at 08:07, Stein Erik Berget wrote: 
> On Wed, 24 Mar 2004 18:47:47 +0000, David Dorward <david at dorward.me.uk> 
> wrote:
> > I have a number of XHTML 1.1 documents, all conforming to the same
> > template, which I want to extract some data from and then insert that
> > data into different XHTML 1.1 documents.
> >
> > As a first step I am trying to read in a document and then print it out
> > again without any modification. I've run into two issues:
> >
> > 1. It appears to be downloading the DTD from the w3c website - this
> > takes time and bandwidth.

Thanks to Mr Berget this issue is now resolved, and its lightning fast
(Thanks!).

> > 2. It seems to be expanding the Doctype line (example below).

This one, unfortunately, is still a problem. Does anybody have a solution?

> > Is there any way to stop this? I'd like to leave the Doctype alone and
> > save time on reading the DTD (I don't care about validation - that is
> > handled elsewhere). I couldn't find anything looking at the docs, but I
> > suspect this is due to not knowing what to look for.

My code now looks like this:

import org.apache.xerces.util.XMLCatalogResolver;
import org.jdom.*;
import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;
import java.io.IOException;

public class Parse {

public static void main (String [] args) {
	//path to find the catalog.xml file
	String cat[] = {"file:///home/david/prog/cms/java/catalog.xml"};
	XMLCatalogResolver resolver = new XMLCatalogResolver();
	resolver.setPreferPublic(true);
	resolver.setCatalogList(cat);

	SAXBuilder builder = new SAXBuilder(true);
	builder.setProperty(
		"http://apache.org/xml/properties/internal/entity-resolver",
		resolver);

	Document doc;
	XMLOutputter outputter = new XMLOutputter();
		try {
		doc = builder.build("/home/david/prog/cms/dorward.me.uk/about/index.html");
		try {
			outputter.output(doc, System.out);       
		} catch (IOException e) {
			System.err.println(e);
		}
	} catch (JDOMException e) {
		// indicates a well-formedness or other error
		System.out.println(" is not well formed: " + e.getMessage());
	} catch (IOException e) { 
		System.out.println("Could not check ");
		System.out.println(" because " + e.getMessage());
	}
}
}

The input document starts:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xht
ml" xml:lang="en">
<head>

But the output document:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd" [
  <!NOTATION w3c-xml PUBLIC "ISO 8879//NOTATION Extensible Markup Language (XML) 1.0//EN">
  <!NOTATION cdata PUBLIC "-//W3C//NOTATION XML 1.0: CDATA//EN">
  <!NOTATION fpi PUBLIC "ISO 8879:1986//NOTATION Formal Public Identifier//EN">
  <!NOTATION length PUBLIC "-//W3C//NOTATION XHTML Datatype: Length//EN">
  <!NOTATION linkTypes PUBLIC "-//W3C//NOTATION XHTML Datatype: LinkTypes//EN">
  <!NOTATION mediaDesc PUBLIC "-//W3C//NOTATION XHTML Datatype: MediaDesc//EN">
  <!NOTATION multiLength PUBLIC "-//W3C//NOTATION XHTML Datatype: MultiLength//EN">
  <!NOTATION number PUBLIC "-//W3C//NOTATION XHTML Datatype: Number//EN">
  <!NOTATION pixels PUBLIC "-//W3C//NOTATION XHTML Datatype: Pixels//EN">
  <!NOTATION script PUBLIC "-//W3C//NOTATION XHTML Datatype: Script//EN">
  <!NOTATION text PUBLIC "-//W3C//NOTATION XHTML Datatype: Text//EN">
  <!NOTATION character PUBLIC "-//W3C//NOTATION XHTML Datatype: Character//EN">
  <!NOTATION charset PUBLIC "-//W3C//NOTATION XHTML Datatype: Charset//EN">
  <!NOTATION charsets PUBLIC "-//W3C//NOTATION XHTML Datatype: Charsets//EN">
  <!NOTATION contentType PUBLIC "-//W3C//NOTATION XHTML Datatype: ContentType//EN">
  <!NOTATION contentTypes PUBLIC "-//W3C//NOTATION XHTML Datatype: ContentTypes//EN">
  <!NOTATION datetime PUBLIC "-//W3C//NOTATION XHTML Datatype: Datetime//EN">
  <!NOTATION languageCode PUBLIC "-//W3C//NOTATION XHTML Datatype: LanguageCode//EN">
  <!NOTATION uri PUBLIC "-//W3C//NOTATION XHTML Datatype: URI//EN">
  <!NOTATION uris PUBLIC "-//W3C//NOTATION XHTML Datatype: URIs//EN">
]>
<?doc type="doctype" role="title" { XHTML 1.1 } ?><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" version="-//W3C//DTD XHTML 1.
1//EN">
<head profile="">

-- 
David Dorward                                 <http://dorward.me.uk/>