[jdom-interest] String length shorten after .getChild().getText() is being used.

Jacques wong jacques_wong at hotmail.com
Wed Apr 2 06:29:44 PDT 2008

Tatu's mentioned a good point that my question not asked very clearly, I'm quite junior on both JDOM & Java, please forgive me if I put something cannot be understood here.
My XML is very simple, but I put the encoding as UTF-8 as if XMLOutputter can't display the wording correctly when I change encoding to "big5".
<?xml version="1.0" encoding="UTF-8"?><?dsd href="zurich.dsd"?><DB>
  <Record>   <ThxRegTxt>Dollar Money Market °òª÷</ThxRegTxt>   <NxtRegTxt>Japanese Yen Money Market</NxtRegTxt>   <InnerReg>    <beg loop="2">SIZE=-2&gt;</beg>    <end loop="3">&lt;/TD&gt;</end>   </InnerReg>  </Record>  </DB>
This is the code I used to display my XML on the console, it works without any problem.
   try {
       Document docXML = new SAXBuilder().build(new File(xmlPath));                   XMLOutputter outputter = new XMLOutputter(Format.getPrettyFormat());       Format format = outputter.getFormat();       format.setEncoding("big5");       outputter.setFormat(format);       outputter.output(docXML, System.out);     } catch (IOException e) {        e.printStackTrace();     }
Then I tried to use JDOM to load into the Vector.
        Vector xmlRecVector = null;        xmlRecVector = new Vector();             Document docXML = new SAXBuilder().build(new File(xmlPath)); // xmlPath is the path of the XML
        Element rootElementList = docXML.getRootElement();
        List recDBList = rootElementList.getChildren("Record");           Iterator i = recDBList.iterator();        int idxOfList = 0;
        while (i.hasNext()) {
              Element recElement = (Element) i.next();              idxOfList = recDBList.indexOf(recElement);                     DbXmlHandlerBean recDBObj = new DbXmlHandlerBean(); //DbXmlHandlerBean is an external data type.                     recDBObj.setRecIndex(idxOfList);
              String s = recElement.getChild("ThxRegTxt").getText();
              System.out.println(s+" : " + s.length() + "\n"); // I used this to count number of character stored.
              // Store into my object, it works fine, you can ignore these codes.
              recDBObj.setNxtRegTxt(recElement.getChild("NxtRegTxt").getText());              recDBObj.setInnerRegBeg(recElement.getChild("InnerReg").getChild("beg").getText());              recDBObj.setInnerBegLoop(Integer.parseInt(recElement.getChild("InnerReg").getChild("beg").getAttributeValue("loop")));                          recDBObj.setInnerRegEnd(recElement.getChild("InnerReg").getChild("end").getText());              recDBObj.setInnerEndLoop(Integer.parseInt(recElement.getChild("InnerReg").getChild("end").getAttributeValue("loop")));
              // Put store XML object into Vector              xmlRecVector.add(recDBObj);
        } // end of while loop
After I've stored the object, I display the whole vector object again.
   DbXmlHandlerBean recDBObj = new DbXmlHandlerBean();   System.out.println("Print Stored Record...");      for (int i=0; i<recVector.size(); i++) {         recDBObj = (DbXmlHandlerBean) recVector.elementAt(i);     
     System.out.println("Record: " + recDBObj.getRecIndex());     System.out.println("Thx: " + recDBObj.getThxRegTxt() + "  Nxt: " + recDBObj.getNxtRegTxt());     System.out.println("InnerBeg: " + recDBObj.getInnerRegBeg() + " loop: " + recDBObj.getInnerBegLoop());     System.out.println("InnerEnd: " + recDBObj.getInnerRegEnd() + " loop: " + recDBObj.getInnerEndLoop() + "\n");        }
But for this time, it can't display my stored text with correct big5 code, but for english only, it works fine.
I believe if the XMLOutputter can display out big5 information, even it should work by using SAXBuilder() for a Document object, the effect should be the same, but I think something that has been missed.I've no idea the mechanism of Xerces related to JDOM under JDK 1.5. Hope some professional can help me to solve this problem. Thanks. 

> Date: Tue, 1 Apr 2008 09:52:11 -0700> From: cowtowncoder at yahoo.com> Subject: Re: [jdom-interest] String length shorten after .getChild().getText() is being used.> To: jdom-interest at jdom.org> > > --- Jacques wong <jacques_wong at hotmail.com> wrote:> > > Hi,> > I'm using JDOM v1.1. Basically, I can use most of> > the function of the JDOM, but I found some stranges> > when I use Element.getChild().getText(); I've an XML> > that contain some big5 characters (externally> > created XML file), both using XMLOutputter for> > outputting screen and XML file have no affection on> > the big5 codeset displays. However, when I tried to> > query each text one by one by using> > Element.getChild().getText(), the String returned> > always is shorter than the original in XMLfile, and> > the Big5 characters are displayed incorrectly. I> > tried to use the conversion. String s = new> > Shorter as measured by... ? Number of characters in> it? Since JDOM is not a parser, encoding/decoding> issues are dealt with by the underlying parser;> default being Xerces when using JDK 1.5+.> > I doubt JDOM has anything to do with the problem. By> the time it gets data from parser, it's all in java> chars/Strings, decoded from input (byte stream> usually) as necessary.> But without a sample document it is impossible to know> what exactly goes wrong.> > The most common error is that the encoding declaration> in the xml document is wrong, and contents are encoded> using some other encoding.> Second common problem is developers printing out text> to console, and console being unable to display it> properly.> > >> String(recElement.getChild("ThxRegTxt").getText().getBytes("UTF-8"),"big5");> > but seems it's not displaying correctly also. My> > No kidding, that's about worst piece of code anyone> could write. I wish compilers would refuse to compile> it. :-p> If it worked as expected, your input data was broken,> and you were just lucky that 2 wrongs made right.> > -+ Tatu +-> > > > _____________________________________________________________________
_______________> You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost. > http://tc.deals.yahoo.com/tc/blockbuster/text5.com> _______________________________________________> To control your jdom-interest membership:> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
2 GB ¶W¤j®e¶q ¡B²©ö¡B°ª®Ä²v¡B±j¤j¦w¥þ¨¾Å@ ¡X ¥ß§Y¤É¯Å Windows Live Hotmail 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.jdom.org/pipermail/jdom-interest/attachments/20080402/965154e4/attachment.htm

More information about the jdom-interest mailing list