[jdom-interest] Feature Request

Dennis Sosnoski dms at sosnoski.com
Sat Feb 21 11:15:34 PST 2004

John Cowan wrote:

>Dennis Sosnoski scripsit:
>>       schema.elementType("span", Schema.M_ANY, Schema.M_ANY, 0);
>>       schema.elementType("div", Schema.M_ANY, Schema.M_ANY, 0);
>>       schema.elementType("table", Schema.M_ANY, Schema.M_ANY, 0);
>>       schema.elementType("br", Schema.M_EMPTY, Schema.M_ANY, 0);
>I'd be interested in knowing why these particular ones were important.
>I understand the issue with script and style.
I ran into some cases where these elements were being misused in the 
HTML pages I was looking at, so patched them in this manner to allow 
arbitrary nesting. I'm not even sure all these are necessary for my 
purposes - I just hacked as I went to muddle through the pages. AFAIK 
the only element definitions which are actually incorrect in your 
current content model are the script and style elements.

>>>>The only downside I've noticed is that the handling it uses to 
>>>>turn HTML into XHTML can go berserk in some cases of real-world HTML, 
>>>>such as <script> and <style> elements within the <body> (it properly 
>>>>tries to force them into a <head> element, so you end up with multiple 
>>>><head>s and <body>s).
>TagSoup's content models are implicitly of the form (A|B|C|...)*, so
>it thinks the content model of the html element is (head|body)*.
>I may do some special-casery to fix this, but probably not for 0.9.2
>unless I see a very easy way to do it.
With the <script> and <style> element containment fixed I don't think 
this'll be a big deal.

  - Dennis

More information about the jdom-interest mailing list