[jdom-interest] Re: XMLOuputter = SAXOutputter + XMLFilter + XMLWriter

Joseph Bowbeer jozart at csi.com
Tue Oct 10 06:04:28 PDT 2000

Date: Mon, 09 Oct 2000 16:14:03 -0500
From: Brett McLaughlin <brett.mclaughlin at lutris.com>
Organization: Lutris Technologies
CC: jdom-interest at jdom.org
Subject: Re: XMLOuputter = SAXOutputter + XMLFilter + XMLWriter

Joseph Bowbeer wrote:
> I've been thinking about SAX event streams and SAX filters, and how
> JDOM should interface with these.  I've also been thinking about
> the kind of support JDOM should provide for data documents
> (no mixed content).
> The use case leading to these thoughts was the necessity to write JDOM
> elements onto a SAX event stream.  For example: outputting an unknown
> number of entries (child elements) to a log file.
> After experimenting with JDOM's XMLOutputter and looking at David
> Megginson's XMLWriter and DataWriter, I'd like to suggest the
> following refactoring of XMLOutputter.
>   XMLOutputter = SAXOutputter + XMLFilter + XMLWriter
> 1. SAXOutputter should be the cornerstone of JDOM output.  (When it is
> implemented, SAXOutputter will convert JDOM pieces into SAX events.)
> Why make SAXOutputter the cornerstone?  Because that's where the best
> leverage is.  If we can generate SAX events correctly, we can do
> anything related to output.  The addition (or removal) of indentation
> and newlines can be viewed as a filter acting on the SAX event stream.

BM> I'm not sure of this - generating SAX events from a JDOM Document is
BM> something we need in SAXOutputter, granted. But do we want to
BM> hardwire this into everything? For example, would it not be slower
BM> to go from JDOM Document -> SAX events -> Output as opposed
BM> to JDOM Document -> output? I think in most cases, yes.
BM> I do admit later that I think /allowing/ XMLOuputter to be chained
BM> onto a SAXOutputter makes sense, in the case where we do have
BM> more of a flexible pipeline model; but I'm not sure requiring that
BM> makes sense. It seems like an extra step that adds time to output.

I doubt the extra layer of method calls will have significant (or even
measurable) impact on the outputter performance.  I think the most
wasteful aspect of generating SAX events is the conversion from JDOM
objects to SAX objects.  Attributes, for example.  Actually, Attributes
may be the only example because it looks like all the other SAX
parameters are Strings...

How significant is this overhead?  I don't know.  One way to reduce the
overhead would be to have JDOM provide its own implementation of the SAX
Attributes interface.  Maybe a simple Attributes adapter/wrapper would
perform OK.

> 2. Take most of what is currently in XMLOutputter and move it into
> something along the lines of Megginson's XMLWriter.
> Note: XMLWriter takes a SAX event stream and writes out XML.
> XMLWriter should (for example) provide options for customizing
> the appearance of the XML header.  It should not, however, provide
> options for adding newlines or indentation, because all whitespace
> is potentially significant (but see #3 below).
> As with Megginson's version, our XMLWriter should implement XMLFilter.
> Since an XMLFilter is an event source as well as an event sink, an
> XMLWriter can be inserted into the middle of an event stream without
> interrupting the flow, and the XML output can be sluiced out the side.

BM> Now this is more intriguing to me. I'd have to see more examples of
BM> pipelining before the work became worth it, but I definitely see the
BM> potential. I'm curious...

I'll send along the XMLFilter versions of data formatter and unformatter
when I finish them. They should be very simple.

> 3. For formatting "data documents", implement a special XMLFilter.
> Call it DataFormatFilter.
> Note the term "data documents".  In Megginson's terminology, these are
> documents that contain only fielded content (no mixed content).  This
> data format filter inserts the additional newlines and indentation
> that are needed to "pretty-print" the data document.  The indent
> width, indent  character, and line ending should all be customizable.
> Note: The DataFormatFilter is similar to Megginson's DataWriter,
> except it should be implemented as a pure filter rather than as a
> subclass of XMLWriter.  (Filter composition rocks; subclassing
> XMLWriter is fragile and unnecessary in this case.)
> (I plan to implement the DataFormatFilter, and the related
> DataUnformatFilter described later.)
> 4. Finally, XMLOutputter becomes a convenience class that provides the
> same toplevel "output" methods it does now.
> XMLOutputter is responsible for creating the constituent components,
> hooking them up to form an output pipeline, and delegating to them.
> Comments?  XMLFilter is a SAX2 thing, which was released 5/2000.
> Does this matter?

BM> Not that it's SAX 2, but that it's SAX. I'm not convinced that there
BM> is any advantage in tying all XML output, re: XMLOutputter,
BM> to SAX.
BM> Right now, if you don't need to parse XML documents, but just
BM> create and output them, you don't need xerces.jar, or anything
BM> other than the very small jdom.jar. I'm not sure that it makes
BM> sense to change that, and introduce a SAX dependence,
BM> which can (almost surprisingly) get very big.
BM> That said, I don't have any problems with
BM> (1) Moving SAXOutputter to use SAX and XMLFilters a lot better
BM> (2) enabling XMLOutputter to use something like that for a feed.
BM> In other words, I'm not at all against allowing XMLOutputter to work
BM> with SAX filters and XMLFilter, but I'm not convinced that we want
BM> all output hard-wired to SAX.

sax2.jar, which contains the SAX2 interfaces plus a set of helper
implementations is only 29K.

> Here are some related ideas for pipelining the input side:
>   XMLReader + XMLFilter + SAXBuilder = JDOM document
> 1. Add an optional DataUnformatFilter to remove newlines and
> indentation from data documents.  This filter reads the SAX event
> stream from the reader/parser and passes it through to the
> SAXBuilder, removing the extra formatting along the way.

BM> This is something we have talked about (allowing up-front stripping
BM> of, for example, whitespace). It would be optional, and I think
BM> a good idea. It's along the lines of what JAXP 1.1 is doing,
BM> by the way.

Opening-up SAXBuilder's 'build' method to accept an XMLReader would
allow users to install their own filters.

> 2. For added convenience, the SAXBuilder should implement XMLFilter.

BM> Convince me - I'm not against this, but I'm not sure I see the
BM> advantage. Perhaps for pipelining?

Yes, for pipelining, though it's not essential because a "T" filter can
be used to duplicate the SAX event stream and send one copy to
SAXBuilder -- provided, that is, that SAXBuilder's 'build' method
accepts an XMLReader as a parameter.

BM> Joe, this is some really excellent work (even though I'm not
BM> 100% sure on it all yet). I really like the idea of building better
BM> pipelines - I think we are going to need things like
BM> HTMLOutputter and HTMLBuilder, as well as some other cool
BM> variations. Building a more robust pipeline makes a lot of sense;
BM> however, I'm not yet convinced that hard-wiring it to SAX
BM> makes sense.
BM> I'm curious - is it SAX you want to use, or the functionality that
BM> these SAX-based components (XMLFilter and XMLWriter) provide?
BM> It reads like it is the functionality; if that is the case, it might
BM> make sense to decouple the filters from SAX.

I want the functionality mostly, though interfacing well with the
endorced standards is certainly a good thing.

It seems to me that if you want to create a pipeline for input and
output, you'll need an event stream.  SAX is the standard for XML event
streams, and SAX is simple and efficient.  If you want to pipeline
without SAX, I think you'll need to create your own event interface just
for JDOM.

Joe Bowbeer

More information about the jdom-interest mailing list