[jdom-interest] Questions regarding implementation of DocType.internalSubset[eg]

philip.nelson at omniresources.com philip.nelson at omniresources.com
Tue Jun 5 10:59:21 PDT 2001


Since I recently talked up the value of the plan to put entity definitions
in DocType's internal subset, I took a little time last night to code it.
There's a few gotcha's to work through and I need some xml guru type of help
for some of them. I'm pretty sure I'm in the right place.

* what to do (if anything) about character entities in the source doc like 
  <!ENTITY Ouml '&#214;'>

The parser turns this into a String from the parsed entity and that is what
gets output.

* NOTATIONs and ndata.  I think we have to implement DTDHandler and include
notations in the internal subset.

* Whitespace handling is pretty arbitrary.  I don't think it is preserved at
all and I have done a simple 2 space indent with trailing "\n"

* is <!ENTITY % e "foo"> equivalent to <!ENTITY %e "foo">

* what to do about external parameter entities because the feature
http://xml.org/sax/features/external-parameter-entities is also not
supported, at least by xerces 1.3.1 (Andy?)

<!DOCTYPE doc [
<!ELEMENT doc (#PCDATA)>
<!ENTITY % e SYSTEM "097.ent">
<!ATTLIST doc a1 CDATA "v1">
%e;
<!ATTLIST doc a2 CDATA "v2">
]>
<doc></doc>

gets turned into
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE doc [
  <!ELEMENT doc (#PCDATA)>
  <!ENTITY %e SYSTEM "097.ent" >
  <!ATTLIST doc a1 CDATA "v1">
  <!ATTLIST doc a2 CDATA #IMPLIED>
  <!ATTLIST doc a2 CDATA "v2">
]>
<doc a1="v1" />

I can handle this manually I think with start and endEntity calls and
assuming that while in the dtd and in an entity, the next element or
attribute or comment decl will be from the entity.  So should skipping
parameter entity expansion be the normal behaviour in all cases or
configurable?







More information about the jdom-interest mailing list