[jdom-interest] ID/IDREF

Mon Sep 25 13:08:13 PDT 2000

bob wrote:
> 
> Yah, I figured something like that would work.
> 
> My only current sticking point is then having to parse
> the DTD/schema to get the information as which attributes
> are indeed ID attributes.  Any hints? ;)
> 
> I really don't want to write a DTD parser, and XML-schema
> is too much in flux for me to bother with at the moment. ;)
> Especially since both are theoretically parsed already by
> Xerces (or whichever XML parser is in use to generate the
> JDOM, if any.)

The tough part of writing only a portion of a DTD parser for
me was getting correct the conditional sections, especially
nested ones. You need to parse parameter entity declarations to
catch the condsect keywords, too. You can't merely search 
through for ATTLISTs since you don't know which are active 
without correct processing of parameter entities and 
conditional sections. *But* if you do PEs, conditional sections
and ATTLISTs you've done all you need to grab the ID information.
ATTLISTs on their own are pretty simple. You could get the code
to do the PEs and condsects from an existing parser, say the
Sun or Xerces parser. It's in there.

> Anyhow, id() will probably be amongst the last things I
> implement.  (I'm working on following[-sibling] and
> preceeding[-sibling] axes currently, which will make
> it almost completely done.)

Once I got the basic DTD parser written, I actually enjoyed the
challenge of the XPath stuff.

> As far as duplicates, I don't think it's exceptional if there
> isn't uniqueness.  Doesn't The Standard simply say first-one-wins?

No, in a document instance, all IDs *must* be unique in a valid XML
document. In well-formed XML you have no IDs whatsoever. If you mean
duplicate attribute specifications in the DTD, then yes, first one
wins:

   <!ATTLIST  foo
       id   CDATA   #IMPLIED
   >
   <!ATTLIST  foo
       id   ID     #IMPLIED
   >

The type of 'id' above would be CDATA, but the below is a validation
error:

   <!DOCTYPE blah [
   <!ELEMENT blah  ( foo )*>
   <!ELEMENT foo  EMPTY >
   <!ATTLIST foo
        id   ID    #IMPLIED
   >
   ]>
   <blah>
     <foo id="burger"/>
     <foo id="burger"/>
   </blah>

> btw, Murray, is your XPath implementation built on JDOM?
> Regardless, is it available as source for perusal?

The XML parser project I wrote goes back several years and has been
used internally for various projects I've been involved with, mostly
DTD analysis (such as comparing SGML and DTDs, eg., HTML 4 and XHTML).
Since this wasn't written for an outside audience, isn't bulletproof,
isn't i18n-compatible (it only needs to parse DTDs in US ASCII), and 
there's no way I could support it, it's unavailable at this time. I
wrote in support for about 3/4 of an earlier XPath draft, and it 
involves a bunch of SAX API extensions I needed to truly analyse, 
alter, document and re-constitute DTDs from the parse. If I'd had 
time I probably could have provided some input into SAX2 based on
this experience, but like so many things I simply haven't had the 
cycles.

Murray

...........................................................................
Murray Altheim, SGML/XML Grease Monkey     <mailto:altheim&#64;eng.sun.com>
XML Technology Center
Sun Microsystems, 1601 Willow Rd., MS UMPK17-102, Menlo Park, CA 94025

      In the evening
      The rice leaves in the garden
      Rustle in the autumn wind
      That blows through my reed hut.  -- Minamoto no Tsunenobu