[jdom-interest] Parsing HTML elements

Rolf Lear jdom at tuis.net
Tue Nov 20 09:14:02 PST 2012


Hmmm not using the default API.

JDOM expects the getURI() method to have a value if there is a prefix for the attribute. This is reasonable... ;)

This indicates the sax stream is broken. JDOM should be throwing "Namespace URIs must be non-null and non-empty Strings".

If you cannot fic the SAX stream code, you can maybe write a proxy class that fixes the URIs as the events pass through.

Rolf


Rolf
Paul Libbrecht <paul at hoplahup.net> wrote:
Hello JDOm experts,

I'm hitting a wall here and I am not sure who is responsible.
Just like the previous series of post, I am trying to parse an HTML document.
In this case I use the CyberNeko HTML parser http://nekohtml.sourceforge.net/ which creates a SAX stream hence is easily convertible to a JDOM document.

Now, my big issue is that the document I have (which I cannot easily change right now) contains undeclared namespace-prefixed attribute-names!

Do I have a way to predefine the namespace somewhere?

thanks in advance

Paul
_______________________________________________
To control your jdom-interest membership:
http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.jdom.org/pipermail/jdom-interest/attachments/20121120/8c6376f0/attachment.html>


More information about the jdom-interest mailing list