From larsen007 at web.de Wed Nov 7 08:41:04 2012 From: larsen007 at web.de (Larsen) Date: Wed, 07 Nov 2012 17:41:04 +0100 Subject: [jdom-interest] Content missing after conversion from W3C Element to JDOM2 Element Message-ID: Hi, I am quite a JDOM2 newbie and noticed strange/incorrect behaviour when converting a W3C-Element to a JDOM-Element. Though, I can?t imagine that this really is a bug as I guess that somebody else would have noticed and probably fixed this before. Also, I couldn?t find other people having this problem. So, at the moment I would rather think that I am doing something wrong here. This is the part that I want to convert from W3C to JDOM stored in the variable "table": RTEmagicC_pdf_icon.png.png This is my code: DOMBuilder domBuilder = new DOMBuilder(); Element jdomTable = domBuilder.build(table).detach(); After this conversion, I have a JDOM element with the correct structure, but the content from the img-tag is missing. I fixed this problem by importing the JDOM source into my project and changing this method: public org.jdom2.Text build(org.w3c.dom.Text text) { // BUG ??? // return factory.text(text.getTextContent()); return factory.text(text.getNodeValue()); } Could anyone please shed some light if this is a bug or a mistake on my side? Lars From larsen007 at web.de Wed Nov 7 08:45:57 2012 From: larsen007 at web.de (Larsen) Date: Wed, 07 Nov 2012 17:45:57 +0100 Subject: [jdom-interest] Content missing after conversion from W3C Element to JDOM2 Element In-Reply-To: References: Message-ID: > I am quite a JDOM2 newbie and noticed strange/incorrect behaviour when > converting a W3C-Element to a JDOM-Element. (snip) PS: Using latest JDOM 2.0.3 and Java 7 ("1.7.0_09") From jdom at tuis.net Wed Nov 7 10:12:30 2012 From: jdom at tuis.net (Rolf Lear) Date: Wed, 07 Nov 2012 13:12:30 -0500 Subject: [jdom-interest] Content missing after conversion from W3C Element to JDOM2 Element Message-ID: Hi Larsen. It does look odd. There's a couple of questions... like, how do the DOMBuilder tests pass? I am only going to be able to look in to this in a few hours time. Rolf Rolf Larsen wrote:> I am quite a JDOM2 newbie and noticed strange/incorrect behaviour when? > converting a W3C-Element to a JDOM-Element. (snip) PS: Using latest JDOM 2.0.3 and Java 7 ("1.7.0_09") _______________________________________________ To control your jdom-interest membership: http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jdom at tuis.net Wed Nov 7 10:31:09 2012 From: jdom at tuis.net (Rolf Lear) Date: Wed, 07 Nov 2012 13:31:09 -0500 Subject: [jdom-interest] Content missing after conversion from W3C Element to JDOM2 Element Message-ID: Hi (again). Based on some double-checking, I suspect that you have a buggy DOM implementation? GetTextContent returns nodeBalue for Text nodes... ? Node.getTextContent says it should anyway. I will check it out some more later. Rolf Larsen wrote:> I am quite a JDOM2 newbie and noticed strange/incorrect behaviour when? > converting a W3C-Element to a JDOM-Element. (snip) PS: Using latest JDOM 2.0.3 and Java 7 ("1.7.0_09") _______________________________________________ To control your jdom-interest membership: http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From larsen007 at web.de Wed Nov 7 13:23:21 2012 From: larsen007 at web.de (Larsen) Date: Wed, 07 Nov 2012 22:23:21 +0100 Subject: [jdom-interest] Content missing after conversion from W3C Element to JDOM2 Element In-Reply-To: References: Message-ID: Hi Rolf, I haven?t used unit tests so far and would need some instructions on how to run them in case this becomes necessary. How can I check for a buggy DOM implementation? Lars On Wed, 07 Nov 2012 19:31:09 +0100, Rolf Lear wrote: > Hi (again). > > Based on some double-checking, I suspect that you have a buggy DOM > implementation? > > GetTextContent returns nodeBalue for Text nodes... Node.getTextContent > says it should anyway. > > I will check it out some more later. > > > > Rolf > Larsen wrote:> I am quite a JDOM2 newbie and noticed > strange/incorrect behaviour when >> converting a W3C-Element to a JDOM-Element. (snip) > > > PS: Using latest JDOM 2.0.3 and Java 7 ("1.7.0_09") > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com From jdom at tuis.net Wed Nov 7 13:48:48 2012 From: jdom at tuis.net (Rolf Lear) Date: Wed, 07 Nov 2012 16:48:48 -0500 Subject: [jdom-interest] Content missing after conversion from W3C Element to JDOM2 Element Message-ID: Hi If you pull the JDOM code from github, set it up as an eclipse project (if you use eclipse...), then right-click the build.xml file and run the eclipse target. If you use eclipse you can then right-click the project and run all tests, or you can run the ant junit target. As for which DOM you use, run your project with the java option -Djaxp.debug=1 to see which DOM is found. Rolf Larsen wrote:Hi Rolf, I haven?t used unit tests so far and would need some instructions on how? to run them in case this becomes necessary. How can I check for a buggy DOM implementation? Lars On Wed, 07 Nov 2012 19:31:09 +0100, Rolf Lear wrote: > Hi (again). > > Based on some double-checking, I suspect that you have a buggy DOM? > implementation? > > GetTextContent returns nodeBalue for Text nodes...?? Node.getTextContent? > says it should anyway. > > I will check it out some more later. > > > > Rolf > Larsen wrote:> I am quite a JDOM2 newbie and noticed? > strange/incorrect behaviour when >> converting a W3C-Element to a JDOM-Element. (snip) > > > PS: Using latest JDOM 2.0.3 and Java 7 ("1.7.0_09") > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com _______________________________________________ To control your jdom-interest membership: http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From larsen007 at web.de Wed Nov 7 14:48:17 2012 From: larsen007 at web.de (Larsen) Date: Wed, 07 Nov 2012 23:48:17 +0100 Subject: [jdom-interest] Content missing after conversion from W3C Element to JDOM2 Element In-Reply-To: References: Message-ID: I will try to test this tomorrow in my company. Lars On Wed, 07 Nov 2012 22:48:48 +0100, Rolf Lear wrote: > Hi > > If you pull the JDOM code from github, set it up as an eclipse project > (if you use eclipse...), then right-click the build.xml file and run the > eclipse target. If you use eclipse you can then right-click the project > and run all tests, or you can run the ant junit target. > > As for which DOM you use, run your project with the java option > -Djaxp.debug=1 to see which DOM is found. > > > Rolf > Larsen wrote:Hi Rolf, > > I haven?t used unit tests so far and would need some instructions on how > to run them in case this becomes necessary. > > How can I check for a buggy DOM implementation? > > > Lars > > > On Wed, 07 Nov 2012 19:31:09 +0100, Rolf Lear wrote: > >> Hi (again). >> >> Based on some double-checking, I suspect that you have a buggy DOM >> implementation? >> >> GetTextContent returns nodeBalue for Text nodes... Node.getTextContent >> says it should anyway. >> >> I will check it out some more later. >> >> >> >> Rolf >> Larsen wrote:> I am quite a JDOM2 newbie and noticed >> strange/incorrect behaviour when >>> converting a W3C-Element to a JDOM-Element. (snip) >> >> >> PS: Using latest JDOM 2.0.3 and Java 7 ("1.7.0_09") >> _______________________________________________ >> To control your jdom-interest membership: >> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com > > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com From jdom at tuis.net Wed Nov 7 16:07:29 2012 From: jdom at tuis.net (Rolf Lear) Date: Wed, 07 Nov 2012 19:07:29 -0500 Subject: [jdom-interest] Content missing after conversion from W3C Element to JDOM2 Element In-Reply-To: References: Message-ID: <509AF7C1.4050608@tuis.net> Hi Lars. I am back home, and I have my JDOM code in front of me. I have just gone through the code, and 'it works for me'. What this means is: if: - you load a Document using a DOM DocumentBuilderFactory supplied by Xerces - and you pass that document to JDOM to build a JDOM document - and that document contains Text nodes that: - JDOM will correctly translate those DOM Text nodes in to JDOM Text nodes. Now, I am not saying that using the getTextContent() is the 'right' method to call. It is possible that I would be better off using getNodeValue(). In fact, in JDOM versions before 2.x it used getNodeValue(). I can't think of why I decided to use getTextContent() instead other than the fact that that part of code was refactored significantly, and I used the documentation carefully, and perhaps there was something that used getTextContent() and I chose to do it that way. I have just run the entire test suite with the code changed to use getNodeValue() and it still works fine for me. On checking the DOM specification, the getTextContent() method was added in DOM level 3. The Java API documentation is a mess in this area.... JDK 1.5 package information indicates that the org.w3c.dom API supports DOM Level 2: http://docs.oracle.com/javase/1.5.0/docs/api/org/w3c/dom/package-summary.html Yet, the Node class indicates it implements Level 3..... and it exposes all the Level 3 changes. In fact, the Java5 new features indicate that: http://docs.oracle.com/javase/1.5.0/docs/guide/xml/jaxp/index.html JAXP implements the Level3 specification. So, what this means is that: - JDOM is doing the right thing - It uses functionality supported since Java5 - it is probably your particular DOM library that has a broken implementation of the new-to-DOM3 method getTextContent() - JDOM does not need to use getTextContent() because the old method getNodeValue() will work just fine. What would be useful is if you could determine the library that you are using. Since you have already 'hacked' the code, why don't you temporarily add the line: System.out.println(text.getClass()); to the method. This will tell you the concrete implementation of DOM that's broken. I will also change the text() method to use getNodeValue() instead... it makes sense to do it if there's a broken library, and it's no big deal for JDOM.... Also, I am planning a release imminently, so the timing is good. If you could get back to me on what library you are using, I will dig in to it and we can see if there's a fix for your library (I imagine that there is....). I did google for "getTextContent bug Text Node" but I find no seemingly relevant hits. Created issue #100 for this: https://github.com/hunterhacker/jdom/issues/100 Rolf On 07/11/2012 5:48 PM, Larsen wrote: > I will try to test this tomorrow in my company. > > > Lars > > > On Wed, 07 Nov 2012 22:48:48 +0100, Rolf Lear wrote: > >> Hi >> >> If you pull the JDOM code from github, set it up as an eclipse project >> (if you use eclipse...), then right-click the build.xml file and run >> the eclipse target. If you use eclipse you can then right-click the >> project and run all tests, or you can run the ant junit target. >> >> As for which DOM you use, run your project with the java option >> -Djaxp.debug=1 to see which DOM is found. >> >> >> Rolf >> Larsen wrote:Hi Rolf, >> >> I haven?t used unit tests so far and would need some instructions on >> howto run them in case this becomes necessary. >> >> How can I check for a buggy DOM implementation? >> >> >> Lars >> >> >> On Wed, 07 Nov 2012 19:31:09 +0100, Rolf Lear wrote: >> >>> Hi (again). >>> >>> Based on some double-checking, I suspect that you have a buggy >>> DOMimplementation? >>> >>> GetTextContent returns nodeBalue for Text nodes... >>> Node.getTextContentsays it should anyway. >>> >>> I will check it out some more later. >>> >>> >>> >>> Rolf >>> Larsen wrote:> I am quite a JDOM2 newbie and >>> noticedstrange/incorrect behaviour when >>>> converting a W3C-Element to a JDOM-Element. (snip) >>> >>> >>> PS: Using latest JDOM 2.0.3 and Java 7 ("1.7.0_09") >>> _______________________________________________ >>> To control your jdom-interest membership: >>> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com >> >> _______________________________________________ >> To control your jdom-interest membership: >> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com > > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com From larsen007 at web.de Thu Nov 8 01:20:52 2012 From: larsen007 at web.de (Larsen) Date: Thu, 08 Nov 2012 10:20:52 +0100 Subject: [jdom-interest] Content missing after conversion from W3C Element to JDOM2 Element In-Reply-To: <509AF7C1.4050608@tuis.net> References: <509AF7C1.4050608@tuis.net> Message-ID: Hi Rolf, first of all, thanks for your extensive help! > The Java API documentation is a mess in this area.... JDK 1.5 package > information indicates that the org.w3c.dom API supports DOM Level 2: > http://docs.oracle.com/javase/1.5.0/docs/api/org/w3c/dom/package-summary.html That?s nice to hear. I was already wondering wether my English is too bad or if the javadoc is so crudely written that I can?t understand it. > What would be useful is if you could determine the library that you are > using. Since you have already 'hacked' the code, why don't you > temporarily add the line: System.out.println(text.getClass()); to the > method. This will tell you the concrete implementation of DOM that's > broken. It?s "org.w3c.tidy.DOMTextImpl". I use JTidy to bring HTML code I obtain from a customer?s database into Java objects. So, should I file a bug against JTidy? My code in that area in case it helps: private org.w3c.dom.Document getDocFromTidy(String html) { Tidy tidy = new Tidy(); tidy.setShowWarnings(false); tidy.setQuiet(true); tidy.setXHTML(true); tidy.setDocType("omit"); // convert text representation to Document InputStream bais = new ByteArrayInputStream(html.getBytes()); try { bais.close(); } catch (IOException e) { log.error("Exception on closing the InputStream", e); } return tidy.parseDOM(bais, null); } Lars From jdom at tuis.net Thu Nov 8 03:35:45 2012 From: jdom at tuis.net (Rolf Lear) Date: Thu, 08 Nov 2012 06:35:45 -0500 Subject: [jdom-interest] Content missing after conversion from W3C Element to JDOM2 Element In-Reply-To: References: <509AF7C1.4050608@tuis.net> Message-ID: <509B9911.3050209@tuis.net> Hi Lars. Indeed, file a bug against JTidy. Here's the offending lines of code in the DOMNodeImpl class from JTidy: /** * @todo DOM level 3 getTextContent() Not implemented. Returns null. * @see org.w3c.dom.Node#getTextContent() */ public String getTextContent() throws DOMException { return null; } THat's from line 523 of: http://jtidy.svn.sourceforge.net/viewvc/jtidy/trunk/jtidy/src/main/java/org/w3c/tidy/DOMNodeImpl.java?revision=1132&view=markup Rolf On 08/11/2012 4:20 AM, Larsen wrote: > Hi Rolf, > > first of all, thanks for your extensive help! > > >> The Java API documentation is a mess in this area.... JDK 1.5 package >> information indicates that the org.w3c.dom API supports DOM Level 2: >> http://docs.oracle.com/javase/1.5.0/docs/api/org/w3c/dom/package-summary.html >> > > That?s nice to hear. I was already wondering wether my English is too > bad or if the javadoc is so crudely written that I can?t understand it. > > >> What would be useful is if you could determine the library that you >> are using. Since you have already 'hacked' the code, why don't you >> temporarily add the line: System.out.println(text.getClass()); to the >> method. This will tell you the concrete implementation of DOM that's >> broken. > > It?s "org.w3c.tidy.DOMTextImpl". I use JTidy to bring HTML code I obtain > from a customer?s database into Java objects. > So, should I file a bug against JTidy? > > > My code in that area in case it helps: > > private org.w3c.dom.Document getDocFromTidy(String html) { > > Tidy tidy = new Tidy(); > tidy.setShowWarnings(false); > tidy.setQuiet(true); > tidy.setXHTML(true); > tidy.setDocType("omit"); > > // convert text representation to Document > InputStream bais = new ByteArrayInputStream(html.getBytes()); > > try { > bais.close(); > } catch (IOException e) { > log.error("Exception on closing the InputStream", e); > } > > return tidy.parseDOM(bais, null); > } > > > > Lars > From larsen007 at web.de Thu Nov 8 03:54:12 2012 From: larsen007 at web.de (Larsen) Date: Thu, 08 Nov 2012 12:54:12 +0100 Subject: [jdom-interest] Content missing after conversion from W3C Element to JDOM2 Element In-Reply-To: <509B9911.3050209@tuis.net> References: <509AF7C1.4050608@tuis.net> <509B9911.3050209@tuis.net> Message-ID: hi Rolf, > Indeed, file a bug against JTidy. I already checked the project but it seems quite dead, so I guess this won?t be fixed in a timely manner if at all... Perhaps you could also add a remark in the FAQs regarding JTidy or whatever you think is appropriate. > Here's the offending lines of code in the DOMNodeImpl class from JTidy: Thanks, I will include this in my bug report. Surprising that such (as I would guess) vital code has not been implemented yet. Again, thx for your help! Lars From larsen007 at web.de Thu Nov 8 04:08:36 2012 From: larsen007 at web.de (Larsen) Date: Thu, 08 Nov 2012 13:08:36 +0100 Subject: [jdom-interest] Content missing after conversion from W3C Element to JDOM2 Element In-Reply-To: <509B9911.3050209@tuis.net> References: <509AF7C1.4050608@tuis.net> <509B9911.3050209@tuis.net> Message-ID: > Indeed, file a bug against JTidy. filed as https://sourceforge.net/p/jtidy/bugs/259/ From garydgregory at gmail.com Thu Nov 8 04:46:44 2012 From: garydgregory at gmail.com (Gary Gregory) Date: Thu, 8 Nov 2012 07:46:44 -0500 Subject: [jdom-interest] Content missing after conversion from W3C Element to JDOM2 Element In-Reply-To: References: <509AF7C1.4050608@tuis.net> <509B9911.3050209@tuis.net> Message-ID: <-3000111064427025659@unknownmsgid> On Nov 8, 2012, at 7:06, Larsen wrote: > hi Rolf, > >> Indeed, file a bug against JTidy. > > I already checked the project but it seems quite dead, so I guess this won?t be fixed in a timely manner if at all... > Perhaps you could also add a remark in the FAQs regarding JTidy or whatever you think is appropriate. You could fork it and fix it (on github for example) Gary > > >> Here's the offending lines of code in the DOMNodeImpl class from JTidy: > > Thanks, I will include this in my bug report. Surprising that such (as I would guess) vital code has not been implemented yet. > > > Again, thx for your help! > Lars > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com From larsen007 at web.de Thu Nov 8 05:08:26 2012 From: larsen007 at web.de (Larsen) Date: Thu, 08 Nov 2012 14:08:26 +0100 Subject: [jdom-interest] Content missing after conversion from W3C Element to JDOM2 Element In-Reply-To: <-3000111064427025659@unknownmsgid> References: <509AF7C1.4050608@tuis.net> <509B9911.3050209@tuis.net> <-3000111064427025659@unknownmsgid> Message-ID: > You could fork it and fix it (on github for example) That?s true, but not very practical. I don?t have time to get to know the jtidy code and fix it. Apart from that, what would I do with the patch? If I sent it to the JTidy project it would probably be waiting for implementation like the open bug. And there are many more methods returning null that shouldn?t. Lars From jdom at tuis.net Thu Nov 8 19:51:32 2012 From: jdom at tuis.net (Rolf Lear) Date: Thu, 08 Nov 2012 22:51:32 -0500 Subject: [jdom-interest] JDOM 2.0.4 released Message-ID: <509C7DC4.8020801@tuis.net> Hi all. JDOM 2.0.4 is now available from the regular locations. The changes for 2.0.4 are as follows: Bugs: Fixes Issue 93 - Some Java containers (e.g. Applets) have security restrictions on accessing System properties. Fixes Issue 94 - Improve the exception-handling in the intializers for the XMLSchema validating singletons in XMLReaders. Fixes Issue 97 - Update to using/packaging Jaxen 1.1.4 Fixes Issue 98 - Improve the XPathHelper class so that it can build queries to all nodes, including at document level. Fixes Issue 100 - Use the functionally equivalent 'older' DOM methods instead of DOM 3 methods with may not be implemented completely in all DOM libraries. Please download the package from: https://github.com/downloads/hunterhacker/jdom/jdom-2.0.4.zip Reminder for Maven Users ======================== Please note that this release was not made to the jdom artifact, but to the jdom2 artifact. The details should be available here (it takes a few hours for maven-central to be synchronized): http://search.maven.org/#artifactdetails|org.jdom|jdom2|2.0.4|jar Happy Coding Rolf From paul at hoplahup.net Tue Nov 20 08:33:42 2012 From: paul at hoplahup.net (Paul Libbrecht) Date: Tue, 20 Nov 2012 17:33:42 +0100 Subject: [jdom-interest] Parsing HTML elements Message-ID: <87D0958A-9178-4594-BA28-FC30D2CAE517@hoplahup.net> Hello JDOm experts, I'm hitting a wall here and I am not sure who is responsible. Just like the previous series of post, I am trying to parse an HTML document. In this case I use the CyberNeko HTML parser http://nekohtml.sourceforge.net/ which creates a SAX stream hence is easily convertible to a JDOM document. Now, my big issue is that the document I have (which I cannot easily change right now) contains undeclared namespace-prefixed attribute-names! Do I have a way to predefine the namespace somewhere? thanks in advance Paul From jdom at tuis.net Tue Nov 20 09:14:02 2012 From: jdom at tuis.net (Rolf Lear) Date: Tue, 20 Nov 2012 12:14:02 -0500 Subject: [jdom-interest] Parsing HTML elements Message-ID: Hmmm not using the default API. JDOM expects the getURI() method to have a value if there is a prefix for the attribute. This is reasonable... ;) This indicates the sax stream is broken. JDOM should be throwing "Namespace URIs must be non-null and non-empty Strings". If you cannot fic the SAX stream code, you can maybe write a proxy class that fixes the URIs as the events pass through. Rolf Rolf Paul Libbrecht wrote: Hello JDOm experts, I'm hitting a wall here and I am not sure who is responsible. Just like the previous series of post, I am trying to parse an HTML document. In this case I use the CyberNeko HTML parser http://nekohtml.sourceforge.net/ which creates a SAX stream hence is easily convertible to a JDOM document. Now, my big issue is that the document I have (which I cannot easily change right now) contains undeclared namespace-prefixed attribute-names! Do I have a way to predefine the namespace somewhere? thanks in advance Paul _______________________________________________ To control your jdom-interest membership: http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jdom at tuis.net Tue Nov 20 15:08:49 2012 From: jdom at tuis.net (Rolf Lear) Date: Tue, 20 Nov 2012 18:08:49 -0500 Subject: [jdom-interest] Parsing HTML elements In-Reply-To: References: Message-ID: <50AC0D81.2060209@tuis.net> Hi Paul. In the mail below I suggested using a parsing proxy. The term I meant to use is a 'Filter'. See this article here: http://www.ibm.com/developerworks/xml/library/x-tipsaxfilter/ You can do some magic with http://www.jdom.org/docs/apidocs/org/jdom2/input/SAXBuilder.html#setXMLFilter(org.xml.sax.XMLFilter) For example, your filter could exend http://docs.oracle.com/javase/6/docs/api/org/xml/sax/helpers/XMLFilterImpl.html and then override the method http://docs.oracle.com/javase/6/docs/api/org/xml/sax/helpers/XMLFilterImpl.html#startElement(java.lang.String,%20java.lang.String,%20java.lang.String,%20org.xml.sax.Attributes) to set the 'attrs' URI's correctly, and then call super.startElement(....). Rolf On 20/11/2012 12:14 PM, Rolf Lear wrote: > > Hmmm not using the default API. > > JDOM expects the getURI() method to have a value if there is a prefix > for the attribute. This is reasonable... ;) > > This indicates the sax stream is broken. JDOM should be throwing > "Namespace URIs must be non-null and non-empty Strings". > > If you cannot fic the SAX stream code, you can maybe write a proxy class > that fixes the URIs as the events pass through. > > Rolf > > > Rolf > > Paul Libbrecht wrote: > > Hello JDOm experts, > > I'm hitting a wall here and I am not sure who is responsible. > Just like the previous series of post, I am trying to parse an HTML > document. > In this case I use the CyberNeko HTML parser > http://nekohtml.sourceforge.net/ which creates a SAX stream hence is > easily convertible to a JDOM document. > > Now, my big issue is that the document I have (which I cannot easily > change right now) contains undeclared namespace-prefixed attribute-names! > > Do I have a way to predefine the namespace somewhere? > > thanks in advance > > Paul > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com > > > > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com > From paul at hoplahup.net Wed Nov 21 12:59:44 2012 From: paul at hoplahup.net (Paul Libbrecht) Date: Wed, 21 Nov 2012 21:59:44 +0100 Subject: [jdom-interest] Parsing HTML elements In-Reply-To: <50AC0D81.2060209@tuis.net> References: <50AC0D81.2060209@tuis.net> Message-ID: <9C431D31-3107-4B60-B0BB-992B97BFA9FB@hoplahup.net> Thanks Rolf, that'd be the right thing indeed which I did not think of. For now, I've implemented a replacement of the raw data... that is simpler. I sure agree JDOM should refuse to do anything with undeclared prefixes. I had tried to add namespace declarations within the factory but that has not been taken in account. thanks. Paul Le 21 nov. 2012 ? 00:08, Rolf Lear a ?crit : > Hi Paul. > > In the mail below I suggested using a parsing proxy. The term I meant to use is a 'Filter'. See this article here: > > http://www.ibm.com/developerworks/xml/library/x-tipsaxfilter/ > > You can do some magic with http://www.jdom.org/docs/apidocs/org/jdom2/input/SAXBuilder.html#setXMLFilter(org.xml.sax.XMLFilter) > > For example, your filter could exend http://docs.oracle.com/javase/6/docs/api/org/xml/sax/helpers/XMLFilterImpl.html > > and then override the method http://docs.oracle.com/javase/6/docs/api/org/xml/sax/helpers/XMLFilterImpl.html#startElement(java.lang.String,%20java.lang.String,%20java.lang.String,%20org.xml.sax.Attributes) > > to set the 'attrs' URI's correctly, and then call super.startElement(....). > > Rolf > > On 20/11/2012 12:14 PM, Rolf Lear wrote: >> >> Hmmm not using the default API. >> >> JDOM expects the getURI() method to have a value if there is a prefix >> for the attribute. This is reasonable... ;) >> >> This indicates the sax stream is broken. JDOM should be throwing >> "Namespace URIs must be non-null and non-empty Strings". >> >> If you cannot fic the SAX stream code, you can maybe write a proxy class >> that fixes the URIs as the events pass through. >> >> Rolf >> >> >> Rolf >> >> Paul Libbrecht wrote: >> >> Hello JDOm experts, >> >> I'm hitting a wall here and I am not sure who is responsible. >> Just like the previous series of post, I am trying to parse an HTML >> document. >> In this case I use the CyberNeko HTML parser >> http://nekohtml.sourceforge.net/ which creates a SAX stream hence is >> easily convertible to a JDOM document. >> >> Now, my big issue is that the document I have (which I cannot easily >> change right now) contains undeclared namespace-prefixed attribute-names! >> >> Do I have a way to predefine the namespace somewhere? >> >> thanks in advance >> >> Paul >> _______________________________________________ >> To control your jdom-interest membership: >> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com >> >> >> >> _______________________________________________ >> To control your jdom-interest membership: >> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com >> >