[jdom-interest] Entity Resolver Cache/Catalog

Rolf Lear jdom at tuis.net
Mon Aug 29 07:58:49 PDT 2011


Hi all.

As I am going through the junit tests, I am now at the point of testing
the SAX and DOM builders. The issue I am having is that I am doing a lot of
my work on the train as I commute.... and I don't have a network
connection.

This is a problem because the validating parsers need to get some DTD's
and XML Schemas from the web... (if they are web-referenced resources).

This is an age-old problem, but I can't think of a great solution. The
ideal would be to run junit tests without having to have a network
connection at all.

Of course, I could just use input documents that only reference local
resources... (and I have) but, in the spirit of JDOM, is there an option
for making this process easy in a general sense?

This is further compounded by there being some restrictions on some
documents too, like the w3.org 'ban' on default Java user-agents:
http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic/

My experimentation indicates that w3.org has put a blanket 'tarpit' of 30
seconds on any connection, regardless of what User-agent you use. This is
'significant'.

Typical solutions to this problem are things like OASIS catalogs, etc. but
that feels heavy-weight... or, is it?

So, what options are there? Any ideas?

I think the following are key issues (and OASIS does not solve them all):

* access to local copies of unavailable resources (no network
connection?).
* general performance improvements by caching entities that have an
appropriate 'expires' timeout... no network access for 'cached' resources.
* improved 'internet-friendliness' reducing unnecessary bandwidth to
places like w3.org
* reduce the amount of 'expertise' a JDOM user needs to do 'the right
thing'.

Can JDOM be easily configured to become a good netizen? Should it be done
by default?

You can comment on the issue I raised as well at:
https://github.com/hunterhacker/jdom/issues/26

Additionally, I'll summarize/copy discussion here on to that issue tracker
as well.

Rolf


More information about the jdom-interest mailing list