[jdom-interest] Philosophical XML (was API Inertia)

Paul Philion philion at acmerocket.com
Wed May 2 08:39:24 PDT 2001

Rusty -

The below case is very simple. Element <p> contains three elements:

- a text element with the value "This is..."
- an element <strong>, containing one text element "word"
- a text element containing "in the middle."

In reality, whitespace has to be dealt with, and it really winds up in the
text elements. The default implementation of text element return whitespace
in the getValue() call, and to strip it you would use getValue().trim(). It
might be advisable to have a getTrimmedValue() on the text element to
implement caching of the trimmed value. Or even use a special version (or
setting) of the builder that trims values for you, if you know ahead of time
that you don't need (or want) the surrounding whitespace.

Personally, I have never understood the need for "mixed content". There are
only elements.

Processing instructions are a little tricky. I have never used a PI outside
of a header (prolog, whatever: before the root element), but I don't know
how others might use them. They seem like a leaf elements, like the text
element (no children).

Comments are also tricky. They are not elements, but they can show up

- Paul

> -----Original Message-----
> From: jdom-interest-admin at jdom.org
> [mailto:jdom-interest-admin at jdom.org]On Behalf Of Elliotte Rusty Harold
> Sent: Wednesday, May 02, 2001 9:58 AM
> To: jdom-interest at jdom.org
> Subject: Re: [jdom-interest] Philosophical XML (was API Inertia)
> At 8:04 PM -0400 5/1/01, Paul Philion wrote:
> >In my mind, XML is very simple: It is a tree of elements; elements that
> >contain other elements and elements that contain text.
> Unfortunately XML is more complicated than this. Your mental model
> ignores mixed content. How does it handle this common case?
> <P>
>    This is a sentence with one really important
>    <strong>word</strong>
>    in the middle.
> </P>
> The P element contains text AND it contains a child element. In
> essence the P element contains two text nodes and one element node.
> The text nodes cannot be reasonably represented as child elements.
> Forgetting this is a very common mistake of people coming to XML from
> database and programming communities as opposed to the document
> communities where mixed content is much more common. I regret that I
> made and even encouraged this mistaken thinking in my first two books
> about XML. On the other hand, by the time I wrote XML in a Nutshell
> I'd realized the errors of my ways, and explaining the difference
> between narrative-centric and data-centric documents became a major
> focus of that book.
> JDOM must be able to support XML 1.0 in its full complexity. We
> cannot limit the API to only documents without any mixed content.
> --
> +-----------------------+------------------------+-------------------+
> | Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
> +-----------------------+------------------------+-------------------+
> |                  The XML Bible (IDG Books, 1999)                   |
> |              http://metalab.unc.edu/xml/books/bible/               |
> |   http://www.amazon.com/exec/obidos/ISBN=0764532367/cafeaulaitA/   |
> +----------------------------------+---------------------------------+
> |  Read Cafe au Lait for Java News:  http://metalab.unc.edu/javafaq/ |
> |  Read Cafe con Leche for XML News: http://metalab.unc.edu/xml/     |
> +----------------------------------+---------------------------------+
> _______________________________________________
> To control your jdom-interest membership:
> http://lists.denveronline.net/mailman/options/jdom-interest/yourad
dr at yourhost.com

More information about the jdom-interest mailing list