[jdom-interest] SaxBuilder.build(url) and encoding

Elliotte Rusty Harold elharo at metalab.unc.edu
Thu Dec 12 06:31:12 PST 2002


At 9:34 PM -0800 12/11/02, Jason Hunter wrote:
>When you use a URL the underlying parser determines the encoding,
>typically by looking at the declaration.

Not necessarily. In an HTTP environment, the encoding specified by 
the MIME type takes precedence over the encoding specified by the XML 
document (though not all parsers get this right). If the HTTP header 
says the document is UTF-8 and the encoding declaration says ISO 
8859-1, then the parser uses UTF-8. I have to double check this, but 
I also think that if the HTTP header says the document is text/xml 
without any encoding, then the parser picks US-ASCII regardless of 
what the encoding declaration says. Again, only some parsers 
correctly implement the spec here.
-- 

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
|          XML in a  Nutshell, 2nd Edition (O'Reilly, 2002)          |
|              http://www.cafeconleche.org/books/xian2/              |
|  http://www.amazon.com/exec/obidos/ISBN%3D0596002920/cafeaulaitA/  |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
|  Read Cafe con Leche for XML News: http://www.cafeconleche.org/    |
+----------------------------------+---------------------------------+



More information about the jdom-interest mailing list