[jdom-interest] Entity resolving - design problem

Todd O'Bryan toddobryan at mac.com
Thu Oct 23 18:17:13 PDT 2003


On Thursday, October 23, 2003, at 07:01  AM, Robert J Munro wrote:

> Todd O'Bryan wrote:
>
>> There is, in fact, a way to do this. You can subclass a Reader and  
>> intercept the character stream on the way into the Parser. If you get 
>>  an ampersand followed by one of the entities you don't want to 
>> expand,  you pass it on as &entity;, if not, you just pass them 
>> on.
>>
>> When you write the file back through the Writer you'll have to be 
>> sure  that you intercept again and change &entity; back to 
>> &entity; on  the way out.
>>
>> All in all, it's about twenty lines of code overwriting read() and  
>> write() in subclasses of Reader and Writer.
>>
>> Email me if you need more specifics,
>> Todd
>
> That sounds like a horrendously bad idea. It goes completely against 
> the whole principle of JDOM (i.e. that you deal with the data, not 
> with the XML).
Until XML can do a round-trip with entities, this will continue to be a 
problem. I was dealing with XML documents created by a client that 
included entities which were nowhere defined. Yes, I realize undefined 
entities lead to malformed XML (not just invalid), but the funny thing 
is, the client was not terribly open to the idea that they should have 
to fix up their bad XML before I would process it. And I could not 
afford to wait and see which new undefined entity would crash my 
program in a new batch of data they hadn't sent me. Got a less 
horrendously bad idea now?

>
> I think the best solution in this case is to use an extra attribute in 
> your own namespace (something like <img my:file="name.jpg" />) to say 
> what the image filename is without a directory while it is XML, then 
> generate the real src attribute with a URL by later.
You're probably right. When you're defining the format, a hack like the 
one above is not the best choice. It is, however, doable. And if the 
things that people called entities are data and not just entities, then 
you have to deal with them.

A good example of this would be something like &date; which presumably 
prints out the current date. If you resolve that on your parse, fiddle 
with it and then want to re-write the original document with your 
changes, you're screwed. The fact that "October 23, 2003" was once 
"&date;" is just lost information. Fine if XML were only intended to go 
one way, but it's not.

In the spec, they made it possible to do things with entities that are 
just a really bad idea, and some of the documentation even suggested 
doing these things. Then people do them, and tie themselves in knots, 
and get annoyed.

>
> Javascript sections could be fixed by defining an image directory in 
> .js files on each location, then changing:
> document.blah.src="/path/another.gif"
> to
> document.blah.src= imagedirectory + "another.gif"
>
> The solution I would use, however, is to put the images in the same 
> location on both servers, either relative to the root of the server, 
> or relative to the documents that reference them. If both those 
> options really are impossible, then I'd put the images on a public 
> server, and have them both point to them with absolute URLs.
>
Umm, how would you do this if you don't have access rights to the same 
directory structures on the two servers? And wouldn't it be a 
horrendously bad idea to make someone viewing a file on a local server 
wait while the images are fetched from another server just so you don't 
have to deal with resolving different file prefixes?

Todd




More information about the jdom-interest mailing list