[jdom-interest] META: Children of a lesser spec

Fri Sep 29 15:36:48 PDT 2000

Little follow-up on the getChild/getChildElement debate.  :-)

Brett McLaughlin wrote:
> 
> > >Now, if it was only correct. But it isn't, and a cursory 
> > >reading of any XML specification makes it confusing, 
> > > especially getChildren()...

> Even on these grounds, you will find
> that XML Infoset clearly defines content, and children.
> 
> >From that spec:
> 
> <<<<<<<<<<<<<
> 
> An element information item has the following properties:
> 
> ...
> 
> 3.[children] An ordered list of child information items, in document
> order. This list contains element , processing instruction, 
> reference to skipped entity, character, and comment information
> items, one for each element, processing instruction,
> reference to an unprocessed external entity,
> data character. and comment appearing immediately within
> the current element, .....

I think it's good to ask which specs we want to use for terminology. 
For example, looking at the XML 1.0 spec I see this production rule:

  [46] contentspec ::= 'EMPTY' | 'ANY' | Mixed | children 
  [47] children ::= (choice | seq) ('?' | '*' | '+')?

You could see this as an example where the XML spec refers to the
exclusive set of subelements as "children" not "childElements".  
What does XML call "children"?  See production rule 47.  It's only
elements.

Also, in Infoset I see this:

   2.3.1. Attributes: Core Properties

   An attribute information item must have the following 
   properties available in some form:

   1.[namespace URI] The URI part, if any, of the attribute's name. 
   2.[local name] The local part of the attribute's name. This does 
   not include any namespace prefix or following colon. 
   3.[children] An ordered list of references to character 
   information items, one for each character appearing in the 
   normalized attribute value. 

This makes me think that if we went with 100% Infoset terminology then
instead of Attribute.getValue() it needs to be Attribute.getChildren()
-- returning the characters which by this terminology are called
children.  I think we can all agree (hopefully) that that's ridiculous.

For a final example, let's look at the XPath spec:

 child::* selects all element children of the context node
 child::text() selects all text node children of the context node
 child::node() selects all the children of the context node, whatever
 their node type

Pretty clear that child::* returns only *element* children.  You use
child::node() to return a list of all types.  When XPath says "child" it
also means element children.

So the XML 1.0 spec has production rules 46/47 which refer to "children"
to mean only "direct subelements".  The Infoset spec is clear as Brett
said, but because of its odd naming conventions regarding children in
the attribute case, I could easily argue we're not honor bound to follow
its terminology exactly.  And XPath clearly has child::* refer to
elements only.

Bottom line: as I stated earlier, I prefer the elegance of getChild /
getChildren over getChildElement / getChildElements.  It matches nicely
with getParent.  It seems less redundant.  It's easier to differentiate
singular from plural.  The problem has always been spec terminology, and
looking at three XML specs today I don't think we'd be in gross
violation using getChild instead of getChildElement, and using the
simpler name definitely "fits" better with the API and I see that as
more important than being robotically precise and including the return
type within the method name.

OK, back to real work.

-jh-