[jdom-interest] [tagsoup-friends] How to parse XML document with default namespace with JDOM XPath

Jack Bush netbeansfan at yahoo.com.au
Tue Nov 4 14:32:45 PST 2008


Hi All,
 
I am having difficulty parsing using Saxon and TagSoup parser on a namespace html document. The relevant content of this document are as follows:
 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional/ /EN" "http://www. w3.org/TR/ xhtml1/DTD/ xhtml1-transitio nal.dtd">
<html xmlns="http: //www.w3. org/1999/ xhtml">
<head>
<meta http-equiv=" Content-Type" content="text/ html; charset=UTF- 8" />
……..
</head>
<body>
    <div id="container">
        <div id="content">
            <table class="sresults">
                <tr>
                    <td>
                        <a href="http:/ /www.abc. com/areas" title=" Hollywood , CA "> hollywood </a>
                    </td>
                    <td>
                        <a href="http:/ /www.abc. com/areas" title=" San Jose , CA "> san jose </a>
                    </td>
                    <td>
                        <a href="http:/ /www.abc. com/areas" title=" San Francisco , CA "> san francisco </a>
                    </td>
                    <td>
                        <a href="http:/ /www.abc. com/areas" title=" San Diego , CA "> San diego </a>
                    </td>
              </tr>
……….
</body>
</html>
 
Below is the relevant code snippets illustrates how I have attempted to retrieve the contents (value of  <a>):
 
             import java.util.*;
             import org.jdom.*;
             import org.jdom.xpath. *;
             import org.saxpath. *;
             import org.ccil.cowan. tagsoup.Parser;
 
( 1 )       frInHtml = new FileReader(" C:\\Tmp\\ ABC.html" );
( 2 )       brInHtml = new BufferedReader( frInHtml) ;
( 3 ) //    SAXBuilder saxBuilder = new SAXBuilder(" org.apache. xerces.parsers. SAXParser" );
( 4 )       SAXBuilder saxBuilder = new SAXBuilder(" org.ccil. cowan.tagsoup. Parser");
( 5 )       org.jdom.Document jdomDocument = saxbuilder.build( brInHtml) ;
( 6 )       XPath xpath =  XPath.newInstance( "/ns:html/ ns:body/ns: div[@id=' container' ]/ns:div[ @id='content' ]/ns:table[ @class='sresults ']/ns:tr/ ns:td/ns: a");
( 7 )       xpath.addNamespace( "ns", "http://www. w3.org/1999/ xhtml");
( 8 )       java.util.List list = (java.util.List) (xpath.selectNodes( jdomDocument) );
( 9 )       Iterator iterator = list.iterator( );
( 10 )     while (iterator.hasNext( ))
( 11 )     {
( 12 )            Object object = iterator.next( );
( 13 ) //         if (object instanceof Element)
( 14 ) //               System.out.println( ((Element) object).getTextN ormalize( ));
( 15 )             if (object instanceof Content)
( 16 )                   System.out.println( ((Content) object).getValue ());
              }
….
 
This program would work on the same document without the default namespace, hence, it would not be necessary to include “ns” prefix along in the XPath statements (line 6-7) either. Moreover, I was using “org..apache. xerces.parsers. SAXParser” to have successfully retrieve content of <a> from the same document without default namespace in the past.
 
I would like to achieve the following objectives if possible:
 
( i ) Exclude DTD and namespace in order to simplifying the parsing process. How this could be done?
( ii ) If this is not possible, how to include it in XPath statements (line 6-7) so that the value of <a> is picked up correctly?
( iii ) Would changing from “org.apache.xerces. parsers.SAXParse r” to “org.ccil.cowan. tagsoup.Parser” make any difference as far as using XPath is concerned?
( iv ) Failing to exlude DTD, how to change the lookup of a PUBLIC DTD to a local SYSTEM one and include a local DTD for reference?
 
I am running JDK 1.6..0_06, Netbeans 6.1, JDOM 1.1, Saxon6-5-5, Tagsoup 1.2 on Windows XP platform.
 
Any assistance would be appreciated.
 
Thanks in advance,
 
Jack
________________________________
Search 1000's of available singles in your area at the new Yahoo!7 Dating. Get Started. __.._,_.___ 
Messages in this topic (1) Reply (via web post) | Start a new topic 
Messages | Files | Photos | Links | Database | Polls | Members | Calendar 
To unsubscribe, send a blank email to tagsoup-friends-unsubscribe at yahoogroups.com 
 
Change settings via the Web (Yahoo! ID required) 
Change settings via email: Switch delivery to Daily Digest | Switch format to Traditional 
Visit Your Group | Yahoo! Groups Terms of Use | Unsubscribe 
Recent Activity
	*  2
New MembersVisit Your Group 
Give Back
Yahoo! for Good
Get inspired
by a good cause.
Y! Toolbar
Get it Free!
easy 1-click access
to your groups.
Yahoo! Groups
Start a group
in 3 easy steps.
Connect with others.
. 
__,_._,___ 


      Search 1000's of available singles in your area at the new Yahoo!7 Dating. Get Started http://au.dating.yahoo.com/?cid=53151&pid=1011
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.jdom.org/pipermail/jdom-interest/attachments/20081104/5dc20e53/attachment.htm


More information about the jdom-interest mailing list