[jdom-interest] Parsing a MODS-document with validation fails

Thomas Scheffler thomas.scheffler at uni-jena.de
Wed Aug 10 00:37:19 PDT 2011


Hi,

right after sending a mail that I did not receive this mail, I got it ;-)

> I've put together a test case for this. See attached files. The XML and
> XSD files go in junit-test/resources
> The TestSAXComplicatedSchema.java goes in
> junit-test/src/java/org/jdom/tes/cases/input
>
> Whatever fix we decide on can be run through this.... currently it just
> reproduces the problem.
>
> The 'bonus' is that the XML/schema/imports are much simpler than the
> MODS stuff.

I was thinking of providing a test case, too. But currently I have have 
a lot on my ToDo list. Thanks for your work.

> Thomas, I've looked at your latest patch, and I think it is too
> heavy-weight... in the sense that it carries a lot of data through the
> hierarchy... two maps, a list, it all seems like too much. I struggled
> to follow some of the logic. I think there's a simpler option.

You are right, it is extra work to maintain these structures for a case 
that no one hit before. One can find arguments for one or the other 
solution. The amount of addition memory should be negligible but my 
patch introduced a bit of work while parsing every document while you 
suggested changes seems to produce more work in the rare case that:

  1. an attribute is present with a namespace and the QName does not 
have a prefix.
  2. information on the prefix is held way up in the document hierarchy.

> I fact, when I looked more closely, the data is all available. If you
> encounter an attribute with the same qName and localname, but with a
> URI, then hunt up the Element hierarchy for a prefixed declaration of
> that namespace.

You are right with that. The data should be there in most cases of this 
rare case. I found your code a bit hard to understand, especially your 
do-while with those override prefix checks. If we do not use the SAX 
events here, and there is a good documentation for that, and write code 
our own, it should be well understandable in a few years.

Maybe I find the time in the following days to set up a benchmark on 
that to compare both solutions. And maybe those differences will 
complete any further discussion ;-)

Until now I was using jdom as a library since it beta stages and I was 
quite impressed how quick I was able to understand the code and provide 
a fix. I think not every library out there is on that quality. This is 
something that had to be said.

regards,

Thomas


More information about the jdom-interest mailing list