[jdom-interest] Accessing Child Elements

Thu Sep 7 19:55:06 PDT 2000

[pvg]. Elliotte has said that
[pvg]        - having getChild return anything other than a child in the
[pvg]          empty namespace is confusing.

>No, he's said that it is confusing in light of the XML Namespace
>specification, which I think is different.

Yes, it is different - perhaps the confusion arises from the fact that
there were two parallel threads under different subjects on the same
topic. I've mostly been participating in the second one so if
Ellioette was talking about the the namespace _spec_ I must have
simply missed it - the spec is not explicitly mentioned in the email I
was responding to. So let me define how I see the relationship between
JDOM and the namespace (or any XML related-) spec:

    JDOM provides a consistent, simple (as far as possible) OO model
that allows the parsing, manipulation and generation of XML documents
that are always specification compliant. The specs describe both the
general semantics of some feature or facility and its encoded
(serialized) representation - whenever possible, in the interests of
simplicity, JDOM object representation provides identical or
equivalent semantics but _need not_ model details that are entirely in
the realm of the serialized representation. The JDOM _serialization
facilities_ (i.e. facilities that use the object model but are not it)
ensure that correct, spec-compliant XML textual representation can
always be parsed parse and generated. JDOM _guarantees_ spec compliant
serialization/deserialization but the focus of the model is a good,
coherent, simple Java OO API - not a Java representation of every
detail of the spec.

The above is my interpretation of JDOM's philosophy - I'm purposefully
emphasising the Java OO API aspect because I beleieve such APIs are
both higher-level and easier to understand than just about any
XML-related specification. In other words, I see JDOM as a facility
for Java programmers to manipulate and generate compliant XML with
_without highly detailed knowledge of the specs_. JDOM bridges the gap
from Java to XML for those of us who know and write more Java than
XML.

>First, we have no constructor that absorbs its parent's namespace, 
>which means a getChild that took that into account would begin to
>build non-uniformity into our method behaviors.

Again, I'll approach this from the Java/OO standpoint from which, I
believe, there is no non-uniformity. First, a constructor is not a
method. There is indeed no cloning of a parents namespace because
there is no parent. I did propose eliminating the 'name only'
constructor for element as an option. And if my full proposal were
implemented (the representation of expanded element types), there
would be no name-only constructor, the constructor would simply take
an element type object that fully describes and encapsulates the name
and the namespace URI.

>Second, it is clear from the XML Namespace rec that a namespace is
>intrinsic to an Element; although it may make sense for an Element to 
>be referenced implicitly by it's parent namespace, it is wrong in 
>regards to the spec, and thus I don't like to encourage it.

At no time, in my proposal, is an element's namespace ever defined in
terms of it's parent or some other element. Every element has an
intrinsic, immutable namespace which can be queried by the
getNamespace method - conceptually, this represents the spec verbatim.
The namespace spec is about static definition - it does not define or
describe traversal. The getChild method is a traversal method, a
getChild that returns a child from it's parens namespace is both
better and simpler OO behaviour and more practical - and it is a
behaviour outside the scope of the namespace spec. Why is it better
and non-confusing OOP? Some of the comments I've seen almsot seem to
imply that an object using its internal state to 'fill in the blank'
to an underspecified parametrized method call is in itself confusing.
I tend to disagree - this is very much classic OO behaviour -
encapsulation. I'd argue that it's both intuitive and sensible for an
instance to use instance rather than global data in such a case.
Please consider the call outside of the XML context for a moment.
I call a method on an instance and omit an optional parameter: what's
more intuitive:

  - the instance will use instance data to respond to my message,
    giving me the _most specific behaviour possible_
 or
  - the instance will use some global, externally defined constant.

The first is clearly far more sensible OO design - I'm not providing a
piece of external data - but I am calling a specific instance, the
instance is responding in the most instance-specific way possible.
Using an external constant is much more of a 'special case' and
one-off behaviour. It also introduces some additional, unneccessary
coupling - why should an Element object have to know what an empty
namespace is and treat it specially? Under my proposal, an element
treats all namespaces uniformly, there are no distinguished namespaces
at all. The argument is centered on the idea that org.jdom.Element is
not an XML serialized element - an XML serialized element is a data
structure, org.jdom.Element is an object - it makes sense for it to
behave like one and that such behaviour is defined in terms of typical
OO behaviour and not in terms of serialized XML structures as described
by the specification.

>Third, it is not at all unusual for the default namespace, and the no
>namespace, to be used. Almost every XSLT document around uses both. So
>your claim that they aren't common uses is way off. XHTML has the same
>case.

I didn't make any claims regarding the default namespace. When I spoke
of eliminating the defaut namespace I did not mean removing support
for it from serialization and deserialization - I meant hiding the
fact that it exists from the core object model. The default namespace
is what introduces the complex scoping rules which are confusing. Both
the concept of a default namespace and associated scoping are a matter
of the serialized representation - hiding them in the in-memory object
model makes things vastly simpler. One way to look at it is to view
the getNamespace method on Element as the equivalent of an explicit NS
declaration in the XML serialized representation. The getNamespace
method is there on every single element - you don't need to have an
explicit namespace declaration of every element when you serialize but
leave that to the serializer - the API already presents a view of
'explicit annotation', having a default namespace (but no namespace
scoping, like in the XML spec - here the current implementation has no
problem going against the spec, I think correctly) just muddies things
up. As to the empty namespace, no, I don't think it's likely to remain
all that common particularly as all standard and app-specific element
definitions become part of one namespace or another. And we were
discussing a convenience method - nothing in the current
implementation or the proposal prevents you from explicitly retrieving
a child from the empty namespace or a child in a namespace different
than the parent - the implementation of the intrinsic method is
identical in both cases. In my model, the general traversal mechanism
is identical to the current one - it requires an explicit
name/namespace pair. The convenience traversal mechanism is context
specific - and that's intuitive because that's how OOP works - every
instance method call is 'context specific' with respec to the instance
you are calling. In the general traversal mechanism, both approaches
allow you to retrieve any sub-element in any namespace with equal
ease. For the convenience case, my model emphasises the ease of
traversal of a structure in the same, context-specific namespace,
yours emphasises the convenience of retrieving a sub-element in some
special-cased, distinguished namespace. 

>All the time. XSLT, SVG, and XML Schemas are two very common cases
where
>I'm constantly having children in a different namespace than the
parent.
>Well designed XSLT templates require this.

Sure, and if the namespaces are different, you use the general
traversal method which is identical to the one in the current
implementation. Neither the current convenience traversal method nor
the one I'm proposing is applicable in such a situation - you wouldn't
use it now, you wouldn't use it if it were changed.

[pvg] Concise, granted but easy and intuitive? I don't see the notion of
a

>I think so, given an understanding of the XML Namespace spec. YOu have
>to remember, we are intuitive as long as that intuitiveness is correct.
>We went through this a long time ago - if something is intuitive but
>WRONG, we won't put it in; we don't encourage bad use-patterns. That's
>why getText() no longer returns trimmed content. People expected it,
but
>they were WRONG, so we made sure to make them be right.

Ah but that's my point - _too much_ of the detail of the namespace
spec has bubbled up to the core model API, making it complex and
counter-intuitive, in an object system. The gist of my proposal is to
remove (partially, by taking out scoping/defaulting rules) or fully
(by relying on object-encapsulated, stable, unique element and
attribute types as described in Appendix A of the spec) the
extraneoius detail and move its complexities out of the core model and
into the implementation of the serialization facilities.

[pvg] prefixed or unprefixed names being directly represented in the
core
[pvg] API - certainly every method on Element that has anything to do
with
[pvg] names (besides getQualifiedName) deals with local names, always.
To

>Not true - it deals with local names AND a Namespace. There is never a
>use in the core of a local name along (even when it's only a local name
>passed in, a Namespace object is added and that's how the Element is
>dealt with). It's always a local name and Namespace.

Same in mine, the only difference being _what_ namespace is added in
the case of the convenience traversal method. In your's it's a
special-cased, distinguished namespace, in mine, it's the namespace of
instance you're calling. I've argued mine is more practical and more
OO-intuitive, but even if you disagree about the convenience method,
both are really a confusing mess. My full proposal (the long thing in
response to Ellioette this morning) wants to bind the notion of
namespace URI and local name in a single a specific object,
representing Element or Attribute type and make both the creation and
traversal APIs rely on that. This is both much safer and much simpler
then both the current state of affairs and my 'change getChild()/take
out the def. namespace' proposal. The only reason I started with the
getChild thing is because it came up and the removal of the default
namespace seemed like a sensible partial step in the right direction.
If you'd rather discuss the more general, prefix-less, element-type
based proposal I'd happily shut up about the getChild thing
immediately :) I think the encapsulated, stable element-type approach
is by far and away The Right Thing, it cuts away 90% of this debate
and I'm really much more interested in talking about it, hopefully
refining and agreeing on it and helping implement it. 

>Element parent = new Element("parent", myNamespace);
>parent.addChild(new Element("child"));

>is not the same as

>Element child = new Element("child");
>Element parent = new Element("parent", myNamespace);
>parent.addChild(child);

>Unless you change the whole API. That's a very confusing situation! I
>think if you add getChild(), you have to change addChild(), and on down
>the line... feels like a slippery slope.

Hang on, I don't want this! In my partial proposal both of these would
do exactly what they do now, I completely agree with the contract on
Element that makes namespaces immutable after instantiation. Once
you've set the namespace, no amount of moving and reparenting will
change it. The primary change in my partial proposal was to change the
behaviour of getChild(name) - that's about it. In my full proposal the
above will look like this (in full verbosity and assuming you know the
elements and namespaces you're interested in in advance (the 80% case,
its not true in generic metadata or transformation/set operation code
[schema/xslt/xpath] which one typically does not implement day to day):

[... somewhere in a globally accessible context ..]
NameSpace   MY_NS             = NameSpaceFactory.get("http://someURI");
ElementType MY_PAREN_ELEMTYPE = new ElementType("parent", ns);
ElementType MY_CHILD_ELEMTYPE = new ElementType("child", ns);

[... anywwhere else, with access to the above definitions ...]

Element parent = new Element(MY_PAREN_ELEMTYPE);
Element child  = new Element(MY_CHILD_ELEMTYPE);
parent.addContent(child);

done. look ma, no prefix! How about traversal? Same thing.

child = parent.getChild(MY_CHILD_ELEMTYPE);

transfer to a different doc?

child = parent.removeChild(MY_CHILD_ELEMTYPE);
newParent.addContent(child);

the nice thing about it is that the element and attribute types are
stable - they are independent of both documents in general and
serialized instances of the same document. To make it even more
consistent and performant, Element and Attribute types can also be
globally interned (requires a factory) since they are globally unique.
This simplifies and speeds up all comparison operations since they can
be done by identity comparison, e.g.

if (someElement.getType() == otherElement.gType())
{
   ...

It also encourages users to stay away from literal constants like
element.getChild("foobar"), a practice far worse and more fragile than
any variant of getChild behaviour we've discussed so far (yes, literal
string constants could be moved to external static class or interface
fields but that is only a syntactic change - they end up literally
embedded in every class file that uses them anyway)

The prefix mapping would be managed externally to the core model
classes. You might want to provide a facility to provide prefix
'hinting' for a particular context (most likely document) at namespace
creation time. Applying the mapping and producing well-formed,
namespace-spec compliant XML is then the job of the serializer. The
element and attribute type constructs are perfectly intuitive and
natural and if you're worried about spec compliance, the concept is
described in Appendix 1 of the namespace spec.

>Why? By definition, that really blows out a major component of the
>specification. I actually really like default namespaces, as they make
>life... well... easy...
[...]
>Again, though, prefixes are an intrinsic part of XML Namespaces. Or did
>I misunderstand you here?
[...]
>I think you are exaggerating here - what is so complex about it? In my
>book, I use namespaces in every example, and have not received a single
>question on them - and I get a ton of questions!

Prefix handling itself is massively complex. The way namespace is
represented in the current object model goes against the concept of
the spec in exchange for modelling a part of the spec that  has to
do with encoding, not concepts. A namespace is supposed to be unique,
JDOM models not the XML namespace concept but the XML namespace
encoding specification - probably worth modelling for serialization
but certainly not in the core (conceptual) api. Conceptually a
namespace is stable and globally unique and fully specified by a URI.
Conceptually, an expanded element or attribute type is stable and
globally unique and is fully specified by a namespace/local-name pair.
I think we should model these concepts in the core API directly and
explicitly, they make the whole thing more elegant, clean and
conceptually consistent than any other Java XML API out there - a true
_object_ model. We're getting bogged down in the whole getChild debate
which is really not all that important - I'd like to hear if anyone is
interested in implementing what I've just described (also outlined in
the last part in my response to Ellioette), in fact, I'm interested in
hearing any feedback, even if it is 'very nice, but not for JDOM' - it
helps JDOM because it defines more conceretely what JDOMs goals are
and it helps me because then I know to go implement this on top of
something else :)

-pvg