[jdom-interest] Problem getting my XML in

Per Norrman per.norrman at austers.se
Wed Aug 18 16:18:05 PDT 2004


Hi,

You have a few problems, but first a general Java advice: It pays
off to print exception stack traces.

1) The hostname in your URL in the attached program is wrong.
    If you do a stack trace here

>        Document doc = null;
>        try {
>            doc = builder.build(urlObj);
>        }
>        catch(Exception ex) {
              ex.printStackTrace();
>            return "Error on making xml returned SAXable" +
> ex.getMessage();
>        }

    you'll see that the exception is java.net.UnknownHostException	

2) But fixing that reveals another, more serious, problem. This service
(I wouldn't call it webservice btw) does *not* return XML. It returns
html with the XML document escaped within a <pre> element!!!
Do a "view page source" on the full URL and you'll see for yourself. I have
no idea why they do it like that--the point is kind of lost.

Of course, your program fails miserably at this point:

> 
>        String geneTrackGeneId =
> doc.getRootElement().getChild("Entrezgene_track-info").getChild("Gene-track").getChild("Gene-track_geneid").getTextTrim();;
>

since the root element is the only element in the document.

Now, perhaps the guys at NCBI provides a method for obtaining *real* XML -- then
use that. But if they don't, you can always build a new document from the text
of the <pre> element. Kind of awkward and not quite robust, but it works. A
sample program is attached.

/pmn

PS. That was one hell of a noisy XML document. I wonder what the
markup to actual data ratio is. Is there a term for this? DS.



-------------- next part --------------
package gene;

import java.io.IOException;
import java.io.StringReader;

import org.jdom.Document;
import org.jdom.Element;
import org.jdom.input.SAXBuilder;
import org.jdom.xpath.XPath;
import org.xml.sax.EntityResolver;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

/**
 * @author Per Norrman
 *  
 */
public class GetGene {
    String _urlPrefix = "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=text&db=gene&dopt=xml&uid=";
    
    String _docType = "<!ELEMENT Entrezgene ANY>";
    String _entrezPublicID = "-//NCBI//NCBI Entrezgene/EN";

    public String getGene(String uid) {
        try {
            SAXBuilder builder = new SAXBuilder();
            builder.setEntityResolver(new EntityResolver() {
                public InputSource resolveEntity(String publicId,
                        String systemId) throws SAXException, IOException
                {
                    if (publicId != null && publicId.equals(_entrezPublicID)) {
                        return new InputSource(new StringReader(_docType));
                    }
                    return null;
                }
            });
            String url = _urlPrefix + uid;
            System.out.println("URL=" + url);
            System.out.println("load");
            Document bogus = builder.build(url);
            String xml = bogus.getRootElement().getText();
            Document doc = builder.build(new StringReader(xml));
            XPath xpath = XPath
                    .newInstance("/Entrezgene/Entrezgene_track-info/Gene-track/Gene-track_geneid");
            Element node = (Element) xpath.selectSingleNode(doc);
            if (node != null) {
                return node.getText();
            } else {
                throw new RuntimeException("Could not find stuff");
            }
        } catch (Exception e) {
            e.printStackTrace();
            throw new RuntimeException(e);
        }
    }
    
    public static void main(String[] args) throws Exception {
        System.out.println(new GetGene().getGene("4537"));
    }
}


More information about the jdom-interest mailing list