[jdom-interest] Feature Request

John Cowan cowan at ccil.org
Sat Feb 21 11:11:06 PST 2004

Dennis Sosnoski scripsit:

> As I understand the 
> TagSoup code, it tries to force <script> elements to be in the <head> 
> of the HTML document, which caused problems when used with a site that 
> uses <script> elements within the <body> of the document. Perhaps John 
> can clarify if I'm misunderstanding the TagSoup code, or if there's some 
> reason TagSoup adds this restriction.

There are three possible issues AFAICT:

1) Currently the style element is only allowed within the head element,
   per HTML 4.01 Transitional.  It's trivial to lift this:  just change
   the line beginning style in src/definitions/html/elements so that
   the fourth field says "%head+%inline" instead of "%head".
   I'll make this change for 0.9.2.

2) If a script element is embedded (directly or indirectly) within a body,
   and TagSoup is forcing it to be in a bogus head element, that's a bug.
   Can someone send me a test case?

3) If a script element is not within either head or body (directly within
   the html element or no element at all) then TagSoup has to guess whether
   to insert a head or a body element, and currently it guesses head.
   I could change this to body, but I'm not sure if that would make things
   better or worse on average.  Since the parser doesn't look ahead, there's
   no way to figure out which is more appropriate in a particular case.

John Cowan   <jcowan at reutershealth.com>   http://www.ccil.org/~cowan
"One time I called in to the central system and started working on a big
thick 'sed' and 'awk' heavy duty data bashing script.  One of the geologists
came by, looked over my shoulder and said 'Oh, that happens to me too.
Try hanging up and phoning in again.'"  --Beverly Erlebacher

More information about the jdom-interest mailing list