[jdom-interest] bad boy tag

Kyle F. Downey kdowney at amberarcher.com
Thu Oct 11 11:48:12 PDT 2001


> Hi All,
>
> <book>
> 	<content>
> 		<html><head></head><body><h2>Introduction</h2><p><img alt="bad boy
> tag">There is a .</p></body></html>
> 	</content>
> </book>
>
> What is the best way to handle 'not well formatted html' within an xml
> document.

This is not really a JDOM question.

1) wrap it all in a CDATA tag; that's the safest for arbitrary HTML
2) make all single-element tags (like img, br, etc.) "passable" in most
browsers by adding a space before the closing slash (e.g. <br />)
3) use XSLT to do what you're trying to do. I think the root problem here
is you're mixing content and exact presentation. Better to transform this
using an HTML outputter, which will convert things like <br /> (which
you'll still need to do) to <br>

The CDATA tag is preferable for containing arbitrary HTML, because there
are lots of things that can cause problems (entities, script fragments
with "<" in them, etc.) in run-of-the-mill HTML. But I'd strongly
recommend option #3. Or look into a technology like DocBook/XML
(http://www.docbook.org) where all the XSLT work has been done for you.

--kd


> My first choice is to cheat and just wrap the content as a comment <!-- -->
>
> Or should I use a dtd to tell the parser that the element <content> contains
> not well formatted html ?
>
> Any example anyone.
>
> If this has been covered before <please forgive="me">
>
> Mark
>
>
> _______________________________________________
> To control your jdom-interest membership:
> http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com
>




More information about the jdom-interest mailing list