[jdom-interest] [jdom-interest]Parsing XML file

Laurent Bihanic laurent.bihanic at atosorigin.com
Wed Mar 19 01:35:56 PST 2003


Jun, Yong wrote:
> I have an XML file that is saved as UTF-8(using notepad).  When I try to parse it using SAXBuilder.build(), I get the error message saying that my XML file is not well-formed.  However, when I save the XML file as ANSI(using notepad again) and parse it, everything's fine.
>  
> Is there a way I can parse the XML file that is saved as UTF-8?

When saving a file in UTF-8, Notepad insert a (legal) 2-byte header at the
start of the file.
Unfortunately, the Crimson parser delivered with JAXP 1.x does not recognize
this header and fires an error. This should have been fixed in JDK 1.4 with
the introduction of the New I/Os (but I've never validate this as I no longer
use Crimson).

The easiest way to workaround this problem is to use Xerces instead of
Crimson. An old version (1.4.4) of Xerces is part of the JDOM distribution.
Just make sure you place the xerces.jar at beginning of your classpath.

Laurent




More information about the jdom-interest mailing list