[jdom-interest] Parsing Microsoft Word Documents

Paul Reeves p_a_reeves at hotmail.com
Sat Dec 18 03:14:11 PST 2004


This isnt technically a jdom question....

Get hold of JTidy http://sourceforge.net/projects/jtidy or even better, 
nekohtml http://www.apache.org/~andyc/neko/doc/html/

Both will fix your unquotted attribute problem and also attempt to correct 
unbalanced tags - jtidy also has a "clean word" facility which is rather 
useful

Paul

>From: Hugo Garcia <hugo.a.garcia at gmail.com>
>Reply-To: Hugo Garcia <hugo.a.garcia at gmail.com>
>To: jdom-interest at jdom.org
>Subject: [jdom-interest] Parsing Microsoft Word Documents
>Date: Fri, 17 Dec 2004 11:56:57 -0500
>
>Hi
>
>I am trying to parse a Microsoft Wrod document with the SAXBuilder but
>I get an error that attributes must be qouted. When I look at the
>document I see that indeed some attibutes, especially in various meta
>tags are not quoted. I wonder if anyone has run into this problem and
>if so if you have a work around or solution.
>
>thanks
>
>-H
>_______________________________________________
>To control your jdom-interest membership:
>http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com




More information about the jdom-interest mailing list