[jdom-interest] SaxBuilder.build(url) and encoding

Elliotte Rusty Harold elharo at metalab.unc.edu
Sat Dec 14 03:42:55 PST 2002

Section F2 of the XML spec states:

The second possible case occurs when the XML entity is accompanied by 
encoding information, as in some file systems and some network 
protocols. When multiple sources of information are available, their 
relative priority and the preferred method of handling conflict 
should be specified as part of the higher-level protocol used to 
deliver XML. In particular, please refer to [IETF RFC 2376] or its 
successor, which defines the text/xml and application/xml MIME types 
and provides some useful guidance. In the interests of 
interoperability, however, the following rule is recommended.

RFC 2376 states:

     Conformant with [RFC-2046], if a text/xml entity is received with
       the charset parameter omitted, MIME processors and XML processors
       MUST use the default charset value of "us-ascii".  In cases where
       the XML entity is transmitted via HTTP, the default charset value
       is still "us-ascii".

RFC 3023 explains further:

    The top-level media type "text" has some restrictions on MIME
    entities and they are described in [RFC2045] and [RFC2046].  In
    particular, the UTF-16 family, UCS-4, and UTF-32 are not allowed
    (except over HTTP[RFC2616], which uses a MIME-like mechanism).  Thus,
    if an XML document or external parsed entity is encoded in such
    character encoding schemes, it cannot be labeled as text/xml or
    text/xml-external-parsed-entity (except for HTTP).


| Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
|          XML in a  Nutshell, 2nd Edition (O'Reilly, 2002)          |
|              http://www.cafeconleche.org/books/xian2/              |
|  http://www.amazon.com/exec/obidos/ISBN%3D0596002920/cafeaulaitA/  |
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
|  Read Cafe con Leche for XML News: http://www.cafeconleche.org/    |

More information about the jdom-interest mailing list