[jdom-interest] JDOM Issue #5 - DTD-aware Attribute output

Rolf Lear jdom at tuis.net
Fri Mar 23 07:32:56 PDT 2012


Hi Paul, all.

So, I will have to 'eat crow'... the Xerces parser will apply defaulted 
values even if the DTD is completely broken.... which is odd (note the 
xxx/yyy discrepency and that 'yyy' is not even declared an Element!).

	String xml = "<!DOCTYPE xxx [ <!ATTLIST yyy dodef CDATA \"mydef\" > ] 
 ><yyy />";
	Document doc = builder.build(new StringReader(xml));
	xout.output(doc, System.out);

gives

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xxx>
<yyy dodef="mydef" />

.... but, when I test it, the new JDOM Format feature works...

         Format speconly = Format.getPrettyFormat();
         speconly.setSpecifiedAttributesOnly(true);
         XMLOutputter xout = new XMLOutputter(speconly);
         xout.output(doc, System.out);

gives

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xxx>
<yyy />


So, now I am more confused... Xerces will apply a completely broken DTD 
to a document, (even the root element name is wrong). Further, it will 
apply the default attribute values, and it will have the right 'flags' 
on the values when it tells JDOM about them, and JDOM will flag the 
attribute as 'not specified', and will ignore it when outputting the XML 
(with the correct flag set on the Format instance).

In fact, going back to the original example, I have the following code:

	public static void main(String[] args) throws JDOMException, IOException {
		String xml = 
"http://svn.activemath.org/LeAM-calculus/LeAM_calculus/oqmath/contin.oqmath";
		SAXBuilder builder = new SAXBuilder();
		Document doc = builder.build(xml);
		for (Element e : doc.getDescendants(Filters.element())) {
			if (e.hasAttributes()) {
				for (Attribute a : e.getAttributes()) {
					if (!a.isSpecified()) {
						System.out.println("Attribute was defaulted " + a);
					}
				}
			}
		}
		Format speconly = Format.getPrettyFormat();
		speconly.setSpecifiedAttributesOnly(true);
		XMLOutputter xout = new XMLOutputter(speconly);
		xout.output(doc, System.out);
	}

And it does exactly what you want.... (except for the namespaces).

Rolf

On 23/03/2012 9:40 AM, Paul Libbrecht wrote:
> Rolf,
>
> I think your assumption is wrong: I remember Michael Kay had a long FAQ entry about justifying why a DTD was read even though validation was not activated (for Saxon Aelfred which we have extensively used) and indeed it is my experience that any parser, Xerces included, parses the DTD completely (including included entities as is the case here) and injects all default values of attributes (including namespaces) without it being validating.
>
> Validating implies breaking somehow after an error (the first or the last?).
>
> To summarize I see the following modes:
> - ignore the DTD completely (no parser does this unless explicitly told it)
> - use DTD (and inclusions) for all default values
> - use DTD and report all errors but keep doing
> - use DTD and break at first error
>
> My understanding is that my SAXBuilder.build was throwing an exception if I activated DTD validation (so the last two possibilities) thus making it impossible obtain a good jdom Document object form a slightly invalid document.
>
> paul
>
> PS: sorry for the mailing-fuss, I thought I sent it to the list a bit later realizing that jdom at tuis.net was not... the list...
>
>
> Le 23 mars 2012 à 14:21, Rolf Lear a écrit :
>


More information about the jdom-interest mailing list