[jdom-interest] Verbose XHTML 1.1 Doctype

Stein Erik Berget seberget at escenic.com
Thu Mar 25 00:07:26 PST 2004


On Wed, 24 Mar 2004 18:47:47 +0000, David Dorward <david at dorward.me.uk> 
wrote:

> I have a number of XHTML 1.1 documents, all conforming to the same
> template, which I want to extract some data from and then insert that
> data into different XHTML 1.1 documents.
>
> As a first step I am trying to read in a document and then print it out
> again without any modification. I've run into two issues:
>
> 1. It appears to be downloading the DTD from the w3c website - this
> takes time and bandwidth.
>
> 2. It seems to be expanding the Doctype line (example below).
>
> Is there any way to stop this? I'd like to leave the Doctype alone and
> save time on reading the DTD (I don't care about validation - that is
> handled elsewhere). I couldn't find anything looking at the docs, but I
> suspect this is due to not knowing what to look for.
Been there done this:

//path to find the catalog.xml file
String cat[] = {"file:///catalog.xml"};
XMLCatalogResolver resolver = new XMLCatalogResolver();
resolver.setPreferPublic(true);
resolver.setCatalogList(cat);

SAXBuilder builder = new SAXBuilder(true);
builder.setProperty("http://apache.org/xml/properties/internal/entity-resolver", 
resolver);

//build the document
Document document = builder.build(new 
BufferedInputStream(method.getResponseBodyAsStream()));

You will need the following import as well...
import org.apache.xerces.util.XMLCatalogResolver;

This solution uses the catalog feature of xerces. The catalog.xml file I 
have looks like this:

<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog 
V1.0//EN" 
"http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog" 
prefer="public">
   <public publicId="-//W3C//DTD XHTML 1.1//EN" uri="xhtml11-flat.dtd" />
</catalog>

You can download the xhtml11-flat.dtd from the w3.org site with this url: 
http://www.w3.org/TR/xhtml11/DTD/xhtml11-flat.dtd

By using the 'flat' variant you don't have to add all the other refereced 
dtds and parts.

By using something simular to this you still have a validated document, 
with great parsing speed.

-- 
Stein Erik Berget



More information about the jdom-interest mailing list