[jdom-interest] Trying to use jdom with TagSoup

Paul Reeves p_a_reeves at hotmail.com
Wed Dec 31 02:18:52 PST 2003


Try using nekohtml (http://www.apache.org/~andyc/neko/doc/html/) by Andy 
Clark. I use this instead of JTidy now as I found I had to perform too many 
hacks to get JTidy to parse pages with custom tags etc


>From: eric at shipek.com
>To: jdom-interest at jdom.org
>Subject: [jdom-interest] Trying to use jdom with TagSoup
>Date: Tue, 30 Dec 2003 12:02:32 -0800 (PST)
>I also posted this to comp.lang.java.programmer and at
>www.javaworld.com, but I thought you guys might know
>best on this one:
>I'm trying to convert html pages to xml and I'm having
>some difficulty
>with the folowing:
>1.  I try to use Tidy but the html that I'm trying to
>convert to xhtml
>has too many errors and so I spend a lot of time trying
>to "fix" the
>html before running it through Tidy. I'm using Tidy
>with -asxml
>2.  I've tried using TagSoup with JDOM but the
>SAXBuilder internally
>tries to set the namespace prefixes and TagSoup does
>not support that
>internal feature.
>I really would appreciate help from someone who has
>delt with having
>to crank out lots of xml(xhtml) from poorly formatted
>html.  I appreciate
>any help! ;)
>To control your jdom-interest membership:

Express yourself with cool new emoticons http://www.msn.co.uk/specials/myemo

More information about the jdom-interest mailing list