[jdom-interest] Parsing files starting with UTF-8 Byte Order Mark

Alastair Rodgers alastair.rodgers at phocis.com
Tue Jul 1 02:12:41 PDT 2003


I forgot to mention, you check for FEFF because this is the Unicode char represented by the UTF-8 byte order mark (EF BB BF). 


> -----Original Message-----
> From: Alastair Rodgers 
> Sent: 01 July 2003 10:10
> To: 'Peter Eriksson'; jdom-interest at jdom.org
> Subject: RE: [jdom-interest] Parsing files starting with 
> UTF-8 Byte Order Mark
> 
> 
> Hi Peter, 
> 
> The UTF-8 byte order mark is supposedly optional, but 
> unfortunately there is a known bug in Sun JVMs which means 
> they do not ignore it; so if it's present, you'll see it in 
> your input stream (Sun JVM bug #4508058, 
> http://developer.java.sun.com/developer/bugParade/bugs/4508058.html). 
> 
> The typical workaround is to do the check yourself when 
> reading the input stream, for example: 
> 
> 	InputStream in = ...
> 	StringBuffer buf = new StringBuffer()
> 	int first = in.read();
> 	if ((first != -1) && (first != 0xFEFF)
> 		buf.append((char)first);
> 
> 	... Read the rest of the stream ...
> 
> I haven't needed to use this with JDOM, but I expect you 
> could get round the problem by using a 
> java.io.PushbackReader. This wraps another Reader and allows 
> you to read the first char, and if it is anything other than 
> 0xFEFF, "push it back" into the Reader before passing the 
> PushbackReader to SAXBuilder().build(). There may be more 
> elegant ways round the problem too. 
> 
> Al.
> 
> 
> > -----Original Message-----
> > From: jdom-interest-admin at jdom.org
> > [mailto:jdom-interest-admin at jdom.org] On Behalf Of Peter Eriksson
> > Sent: 01 July 2003 06:46
> > To: jdom-interest at jdom.org
> > Subject: [jdom-interest] Parsing files starting with UTF-8 
> > Byte Order Mark
> > 
> > 
> > Hello Everybody,
> > 
> > I have a problem with parsing some XML files generated from
> > .Net. It seems that the file starts with the Byte Order Mark 
> > for UTF-8 (EF BB BF). If I try to load the file using jdom-b8 
> > I get an exception. Is there some way that I can load files 
> > with or without this Byte Order Mark transparently, i.e. 
> > without an exception being thrown.
> > 
> > Anybody have a solution to the problem?
> > 
> > /Peter
> > 
> > 
> > 
> > 
> > 
> 



More information about the jdom-interest mailing list