[jdom-interest] XMLOutputter/SAXBuilder

Elliotte Rusty Harold elharo at metalab.unc.edu
Sun Sep 16 08:10:40 PDT 2001

At 9:00 AM -0500 9/16/01, philip.nelson at omniresources.com wrote:
>> If we want to provide an isSpecified() method on Attribute 
>> then this would be separate from the getType() method. 
>I guess I have been thinking of this in terms of DocType providing the
>answer rather than Attribute.  This eliminates the need for extra, often
>dead references to an attribute type.
>AttributeType type = aDoctype.getType(anAttribute);

It's not part of the Doctype though. It's a quality of the attribute. I suppose you could get this information from the DTD (NOT the same thing as the DOCTYPE declaration) but we don't model the DTD and we aren't planning to. JDOM models the instance document. It needs to expose what's in the instance document's infoset, and that includes attribute types. 

>In this case the programer could determine if the attribute *could* have
>been produced from a default in the dtd.  That would have been enough to
>help the programmer who started this thread, though it would have still
>required a XMLFilter or SAXHandler subclass to get it done.  The DeclHandler
>and subsequent reporting of attributes in the document will not give us
>enough to say if a particular attribute came from the dtd or not. 

Again, there's no way to definitively determine this in SAX2. In SAX 2.1 the parser will tell us exactly this for each attribute so it's very easy to apply to each attribute. 

>> You'd actually need a DeclHandler and then some to report 
>> whether an attribute is or is not defaulted. I don't think 
>> SAX2 exposes this in any consistent way. This is scheduled 
>> top be added in SAX 2.1. However, attribute types are 
>> available for all reported attributes in standard SAX.  
>I glossed over this because the real point is this.  We chose (and I
>implemented) an api where the internal subset is exposed as a string.  To
>support some sort of getType method we have to instead model the dtd.  There
>may be good reasons to do this, including supporting SAXOutputter. At the
>time we made this choice, it was because even with the completely broken
>entity api and no access to the dtd whatsoever, we got very few comments
>about it.  

This is not about modeling the DTD. This about modelling the infoset of the document. One of the infoset characteristics is attribute type which is a property of the Attribute information item. If we model this at all, then we need to include as accessor and perhaps mutator methods on the Attribute class. It does not belong in DOCTYPE class. 

>If it is a good idea, we need somebody to code it.  The points about dtd vs
>schema had to do with another idea I have been basing assumptions on.  I
>think the dtd should be modeled as xml in the doctype, or more specifically
>as jdom elements and attributes.  This is how schema would be modeled I
>assume, since it's already xml.  In both cases, modeling constraints would
>require classes and an api.  Though I have worked my way through a few
>schemas by now, I do not know the spec well enough to say if there is enough
>in schema in common with dtd to try and model them the same way.  If
>everything in a dtd could be modeled as a schema, well enough to turn around
>and write the internal subset again, sharing a model makes sense, but I have
>to think this is unlikely.  If I am right, the schema effort is separate
>from the dtd effort and who will want it badly enough to do it?  Granted,
>since the handlers have already been implemented, it won't be that hard.

DTDs and schemas overlap, but neither is a subset of the other. There are things a DTD can do a schema cannot do and vice versa. 

I agree that there may be reason to expose the types of attributes and perhaps elements as more than just a fixed string or a list of enumerated constants, and I've been thinking about that; but I'm just now as I type this realizing that I was wrong. We can't merge the two. An attribute can have both a schema type and a DTD type. Furthermore, while an attribute always has exactly one DTD type (even if there is no DTD) an attribute can theoretically have multiple schema types, one for each schema applied to the document. 

Therefore I resubmit my original proposal: we need to add a getType() (or perhaps getDTDType()) method to the Attribute class which returns the type of the attribute as declared by the DTD, or, in the event there is no DTD, CDATA. (This is the default type assigned for DTD-less documents by the XML 1.0 spec.) If we later need to add schema types, then those would be returned by a separate method. However, schema types would not obviate the need for knowing the DTD type. Thus there's no reason to hold off adding the methods to return the DTD type now. 

| Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
|          The XML Bible, 2nd Edition (Hungry Minds, 2001)           |
|              http://www.ibiblio.org/xml/books/bible2/              |
|   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      | 
|  Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/     |

More information about the jdom-interest mailing list