[jdom-interest] Questions regarding implementation of DocType.internalSubset[eg]

Jason Hunter jhunter at collab.net
Wed Jun 13 11:44:32 PDT 2001

philip.nelson at omniresources.com wrote:
> * what to do (if anything) about character entities in the source doc like
>   <!ENTITY Ouml '&#214;'>
> The parser turns this into a String from the parsed entity and that is what
> gets output.

Try to create a string as close to the original as possible.

> * NOTATIONs and ndata.  I think we have to implement DTDHandler and include
> notations in the internal subset.


> * Whitespace handling is pretty arbitrary.  I don't think it is preserved at
> all and I have done a simple 2 space indent with trailing "\n"
> * is <!ENTITY % e "foo"> equivalent to <!ENTITY %e "foo">


> * what to do about external parameter entities because the feature
> http://xml.org/sax/features/external-parameter-entities is also not
> supported, at least by xerces 1.3.1 (Andy?)
> <!DOCTYPE doc [
> <!ELEMENT doc (#PCDATA)>
> <!ENTITY % e SYSTEM "097.ent">
> <!ATTLIST doc a1 CDATA "v1">
> %e;
> <!ATTLIST doc a2 CDATA "v2">
> ]>
> <doc></doc>
> gets turned into
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE doc [
>   <!ELEMENT doc (#PCDATA)>
>   <!ENTITY %e SYSTEM "097.ent" >
>   <!ATTLIST doc a1 CDATA "v1">
>   <!ATTLIST doc a2 CDATA "v2">
> ]>
> <doc a1="v1" />
> I can handle this manually I think with start and endEntity calls and
> assuming that while in the dtd and in an entity, the next element or
> attribute or comment decl will be from the entity.  So should skipping
> parameter entity expansion be the normal behaviour in all cases or
> configurable?

I'd be OK with always staying true to the original.

Hope you and Harry can check each other's work on this.  Comparing
approaches will probably be insightful.


More information about the jdom-interest mailing list