[jdom-interest] special characters breaking parse??

Matthew MacKenzie matt at xmlglobal.com
Fri Jan 26 00:15:40 PST 2001


Jason,

I am using SAXBuilder, maybe I am using it wrong??...my parse code is:

Document d = new SAXBuilder().build(inStream);

Am I doing something wrong?

The XML is coming to me from emusic.com, and it obviously has problems - I
already have to call inputstream.skip(1) before passing the inputStream to
SAXBuilder because there is a '\n' before the XML declaration :-P  It seems
I have found a good case for always declaring your encoding when authoring
XML :-)

Thanks for the info.

-matt

<<| message from: Jason Hunter <jhunter at collab.net> |>>
It works OK if you specify in the decl:
> 
> <?xml version="1.0" encoding="ISO-8859-1"?>
> 
> When files look ASCII, I believe the parser defaults to UTF-8 unless you
> have an encoding to say differently.  See
> http://www.w3.org/TR/REC-xml#sec-guessing.
> 
> For the record, I saw the same error with DOMBuilder (why are you using
> DOMBuilder?).  In SAXBuilder you get a better description:
> 
> org.jdom.JDOMException: Error on line 3: An invalid XML character
> (Unicode: 0x84) was found in the element content of the document.
>         at org.jdom.input.SAXBuilder.build(SAXBuilder.java:348)
> 
> BTW, make sure you outputter.setEncoding("ISO-8859-1") on output.
> 
> -jh-
> 
> 
> Matthew MacKenzie wrote:
> > 
> > Hello,
> > 
> > I am parsing an XML file,  and when characters with accents and such are
> > encountered,
> > the following stack trace is thrown.  I tried changing the encoding to
> > UTF-8, but that didn't work.
> > 
> > Has anyone else had this problem?
> > 
> > <stackTrace>
> > 
> > org.jdom.JDOMException: The element type "TITLE" must be terminated by
the
> > matching end-tag "</TITLE>".: Error on line 180: The element type
"TITLE"
> > must be terminated by the matching end-tag "</TITLE>".
> >         at org.jdom.input.SAXBuilder.build(SAXBuilder.java:315)
> >         at org.jdom.input.SAXBuilder.build(SAXBuilder.java:337)
> > </stackTrace>
> > 
> > Relevant Data:
> > 
> > 169      <TRACK>
> > 170       <TRACKID>41676</TRACKID>
> > 171        <TITLE>Tannhäuser / Derivè</TITLE>
> > 172        <ALBUM>The Shape Of Punk To Come</ALBUM>
> > 173       <ARTIST>Refused</ARTIST>
> > 174       <GENRE></GENRE>
> > 175
> >
176<FILENAME>Refused-The_Shape_Of_Punk_To_Come-11-Tannhäuser_Derivè.mp3</FILENAME>
> > 177        <SIZE>7797864</SIZE>
> > 178       <FORMAT>.mp3</FORMAT>
> > 179       <QUALITY>128000</QUALITY>
> > 180        <CHANNELS>2</CHANNELS>
> > 181        <DURATION>489</DURATION>
> > 182      </TRACK>
> > 
> > --
> > Matthew MacKenzie
> > 
> > _______________________________________________
> > To control your jdom-interest membership:
> >
http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com
> 
> 
<<| end message from Jason Hunter <jhunter at collab.net> |>>

--
Matthew MacKenzie
VP Research & Development, Founder
XML Global Technologies, Inc.



More information about the jdom-interest mailing list