[jdom-interest] Parsing Microsoft Word Documents

Per Norrman per.norrman at austers.se
Sat Dec 18 03:26:33 PST 2004


Hugo Garcia wrote:
> Hi
> 
> I am trying to parse a Microsoft Wrod document with the SAXBuilder but
> I get an error that attributes must be qouted. When I look at the
> document I see that indeed some attibutes, especially in various meta
> tags are not quoted. I wonder if anyone has run into this problem and
> if so if you have a work around or solution.
> 

Then it's not XML, but probably HTML produced by saving a Word doc
in html format. You can always try using the tagsoup parser:

http://mercury.ccil.org/~cowan/XML/tagsoup/

The, just create the SAXBuilder like so:

   new SAXBuilder("org.ccil.cowan.tagsoup.Parser");

/pmn


More information about the jdom-interest mailing list