[jdom-interest] StAX support

Rolf jdom at tuis.net
Tue Nov 15 20:53:26 PST 2011


Thanks for the feedback.

I will further investigate the DTD issues, but it is good to hear that 
you think this is reasonable. I think it makes sense to make it as 
woodstox friendly as possible, so I will make the effort on it.

Right now the implementation is relatively complete, with a reasonable 
set of JUnit tests. It is nice that I have been able to get almost 
identical test coverage for the StAX and XML outputters. This makes it 
feel stable even though it is really new.

Anyway, thanks for your input.

Rolf

On 15/11/2011 2:04 PM, Tatu Saloranta wrote:
> (apologies for messed formatting -- yahoo mail editor is bit odd)
>
>
> ----- Original Message -----
>
>> hi Tatu
>
>> In my mind I think it would be reasonable to support StAX source/sink in JDOM with the following conditions:
>
>> Input:
>> - JDOM  will ignore DTD events unless explicitly configured to receive them. If they are expected, they must be a full DTD "<!DOCTYPE root ....>" not just an 'internal subset' or other invalid value. (this eliminates woodstox as a 'supported' parser I think as it only provides the internal subset), and the internal Java6 implementation creates a partial/truncated doctype"<!DOCTYPE emt " unless there's an internal subset, at which point it produces everything... )
>
> - JDOM will treat SPACE and CHARACTERS events identically
> - the XML*Reader must be configured to provide CDATA events, otherwise
> JDOM will never know.
> - JDOM can process JDOM fragments from partially processed XML*Readers as
> long as they are on a logical event (i.e. excluding END_ELEMENT,
> END_DOCUMENT, and the likes).
>
> Output:
> - JDOM can output all of it's content types, but special handling for
> EntityRef is required (not sure of all the details yet).
> - DocType content will be output as a single String "<!DOCTYPE element
> SYSTEM .... [ ... ]>"
> - JDOM content can be output as fragments to partially-written XML*Writers
> on the condition that the writer is at an appropriate state before the JDOM
> write happens.
>
>
> I think the behaviour of the various StAX parsers/libraries is consistent
> enough to provide a reasonable base for the above restrictions....
>
> Any 'expert' observations/criticisms/suggestions?
>
> Rolf
> -----
>
>
> Looks reasonable to me. The only question I have is wrt DTD. Javadocs for XMLStreamReader.getText() state:
>
> "Returns the current value of the parse event as a string,
>   this returns the string value of a CHARACTERS event,
>   returns the value of a COMMENT, the replacement value
>   for an ENTITY_REFERENCE, the string value of a CDATA section,
>   the string value for a SPACE event,
>   or the String value of the internal subset of the DTD."
>
> which is why Woodstox returns the internal DTD subset, as per specification. But as Stax 1.0 DTD handling is crippled basically, I don't really care deeply either way -- I did specify Stax2 extension API (see http://woodstox.codehaus.org/4.1.0/javadoc/index.html under 'Stax2'), which is implemented by Woodstox and Aalto, and it patches all issues I found with the 'vanilla' Stax API.
>
> I think it is a good idea to support fragment handling, and fine to just drop CHARACTERS/SPACE distinction.
>
> -+ Tatu +-
>



More information about the jdom-interest mailing list