[jdom-interest] Problem with xml:lang attribute

Peter Whitham Peter.Whitham at Kpsol.com
Tue Jul 1 02:35:02 PDT 2003


I am using NekoHTML parser to read in web pages so that I can process them
with JDOM. I am currently using Beta 9 of JDOM, 0.7.6 of Neko.

On some pages I get an exception:
org.jdom.IllegalNameException: The name "xml:lang" is not legal for JDOM/XML
attributes: Attribute names cannot contain colons.
	at org.jdom.Attribute.setName(Attribute.java:360)
	at org.jdom.Attribute.<init>(Attribute.java:228)
	at org.jdom.Attribute.<init>(Attribute.java:276)
	at org.jdom.input.SAXHandler.startElement(SAXHandler.java:517)
	at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown
	at org.cyberneko.html.HTMLTagBalancer.callStartElement(Unknown
	at org.cyberneko.html.HTMLTagBalancer.startElement(Unknown Source)
	at org.cyberneko.html.HTMLScanner$ContentScanner.scan(Unknown
	at org.cyberneko.html.HTMLScanner.scanDocument(Unknown Source)
	at org.cyberneko.html.HTMLConfiguration.parse(Unknown Source)
	at org.cyberneko.html.HTMLConfiguration.parse(Unknown Source)
	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
	at org.jdom.input.SAXBuilder.build(SAXBuilder.java:370)
	at org.jdom.input.SAXBuilder.build(SAXBuilder.java:724)
	at com.kpsol.solutionprocessor.spider.WebSpider.processURL...

The offending line in the HTML appears to be:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> 

I believe xml:lang is a valid attribute, and should be accepted by JDOM.

I have seen there was discussion about the xml:lang attribute back in 2000,
but thereafter jdom-interest seems to go quiet about this.
XML in a Nutshell 2nd Edition says that xml:lang is ok, and discussions here
seem to think so too.

Is this a known feature of JDOM, and JDOM isn't intended to handle these?
Is this a bug that's recently come back?
Is there a work-around?
Have I just totally misunderstood the specs, and been hit by an oddity on
these pages written by someone else who doesn't understand either?

P G B Whitham

The views expressed in this E-mail are those of the author and not necessarily those of Knowledge Powered Solutions. 
If you are not the intended recipient or the person responsible for delivering to the intended recipient, please be advised that you have received this E-mail in error and that any use is strictly prohibited. 

If you have received this E-mail in error, please notify us by forwarding this E-mail to the following address:

mailadmin at kpsol.com

More information about the jdom-interest mailing list