[jdom-interest] SaxBuilder.build(url) and encoding

Rodrigo Alvarez ralvarez at dybox.cl
Thu Dec 12 07:14:46 PST 2002


I have now done further testing and it cannot be the file contents cause my 
workaround for the problem is to open the URL, read the stream contents 
into a StringBuffer and then use the SaxBuilder.build(String) method to 
parse the XML. This works works fine.
I use JDOM with Xerces and Xalan. Does Xerces get the encoding part right? 
Anyone knows?


At 09:31 12-12-2002 -0500, you wrote:
>At 9:34 PM -0800 12/11/02, Jason Hunter wrote:
>>When you use a URL the underlying parser determines the encoding,
>>typically by looking at the declaration.
>Not necessarily. In an HTTP environment, the encoding specified by the 
>MIME type takes precedence over the encoding specified by the XML document 
>(though not all parsers get this right). If the HTTP header says the 
>document is UTF-8 and the encoding declaration says ISO 8859-1, then the 
>parser uses UTF-8. I have to double check this, but I also think that if 
>the HTTP header says the document is text/xml without any encoding, then 
>the parser picks US-ASCII regardless of what the encoding declaration 
>says. Again, only some parsers correctly implement the spec here.
>| Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
>|          XML in a  Nutshell, 2nd Edition (O'Reilly, 2002)          |
>|              http://www.cafeconleche.org/books/xian2/              |
>|  http://www.amazon.com/exec/obidos/ISBN%3D0596002920/cafeaulaitA/  |
>|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
>|  Read Cafe con Leche for XML News: http://www.cafeconleche.org/    |

Rodrigo Alvarez
DyBOX Consulting and Development

Hernando de Aguirre 906 Providencia.
Santiago, Chile.
(562) 231 7840

More information about the jdom-interest mailing list