<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hi Michael.<br>
<br>
Feel free to butt-in!<br>
<br>
As a 'summary' for you, I believe that there are 'valid' and
'specified' instances where you *may* get attributes in a namespace
where there's no prefix for the attribute's qName from the SAX
Parser.<br>
<br>
It would be great if you could look in to it, and help decide
whether it is an issue that needs fixing in JDOM, or whether it is
something that needs a work-around, or a bug-report to Xerces/SAX,
whatever.<br>
<br>
Of course, I've read the (relevant parts of the) specs, and, these
particular use cases appear to fall through the cracks a bit, so it
is very hard to assign 'blame' for where the parsing 'fails'.<br>
<br>
Specifically, I believe there are three broad conditions for all
"attributes in a namspace":<br>
1. The normal condition where an attribute in a namespace is given
to the startElement() method with the correct (prefixed) qname.<br>
2. The 'unusual' condition where an attribute is in a namespace, and
the namespace is available with a prefix, but the SAX Parser(Xerces)
does not set that prefix on the qName (maybe an xerces bug?)<br>
3. The even more unusual condition where an attribute is in a
namespace, but there is no declared version of that namespace with a
prefix, and SAXParser has an unqualified qName<br>
<br>
<br>
You should be able to test these conditions 'easily'.<br>
<br>
<h1>Situation 1</h1>
<br>
The normal situation:<br>
<br>
<doc xmlns:attns="myns" attns:att="value" /><br>
<br>
in this case, the co-ordinates for the attribute as supplied in
startElement are: localname=att qName=attns:att
value=defval URI=myns<br>
<br>
<h1>Situation 2</h1>
<br>
The second and third examples both rely on there being a 'default'
or 'fixed' attribute of form="qualified" declared on an XML Schema.
For example with the XMLSchema:<br>
<br>
<pre wrap=""><?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns=<a class="moz-txt-link-rfc2396E" href="http://www.jdom.org/tests/default">"myns"</a>
xmlns:xs=<a class="moz-txt-link-rfc2396E" href="http://www.w3.org/2001/XMLSchema">"http://www.w3.org/2001/XMLSchema"</a> targetNamespace=<a class="moz-txt-link-rfc2396E" href="http://www.jdom.org/tests/default">"myns"</a>
elementFormDefault="qualified">
<xs:element name="doc">
<xs:complexType>
<font color="#330033"><b><xs:attribute name="att" default="defval" form="qualified"/></b></font>
</xs:complexType>
</xs:element>
</xs:schema></pre>
<br>
With the above schema, here are two representative documents. The
first document is a representation of situation 2, where the SAX
parser 'should' know the prefix for the attribute:<br>
<br>
<ns:doc xmlns:ns="myns"<br>
xmlns:xsi=<a class="moz-txt-link-rfc2396E"
href="http://www.w3.org/2001/XMLSchema-instance">"http://www.w3.org/2001/XMLSchema-instance"</a><br>
xsi:schemaLocation=<a class="moz-txt-link-rfc2396E"
href="http://www.jdom.org/tests/default./SAXTestComplexMain.xsd">"myns
./MySchema.xsd"</a>><br>
<br>
<br>
This should be parsed by the SAXParser, and it should add in the
'default' attribute 'att' as part of the startElement() call. The
resulting 'parsed' document 'should' look like:<br>
<br>
<ns:doc <font color="#33cc00"><b>ns:att="defval"</b></font><br>
xmlns:ns="myns"<br>
xmlns:xsi=<a class="moz-txt-link-rfc2396E"
href="http://www.w3.org/2001/XMLSchema-instance">"http://www.w3.org/2001/XMLSchema-instance"</a><br>
xsi:schemaLocation=<a class="moz-txt-link-rfc2396E"
href="http://www.jdom.org/tests/default./SAXTestComplexMain.xsd">"myns
./MySchema.xsd"</a>><br>
<br>
<br>
But, using xerces, the SAXParser is giving the 'co-ordinates' of the
'att' attribute as localname=att qName=att value=defval
URI=myns<br>
<br>
This is situation 2, where it would 'make sense' for the parser to
specify the qName as 'ns:att' instead of just 'att'.<br>
<br>
<h1>Situation 3</h1>
<br>
Situation 3 is like situation 2, but gives the SAXParser less
information to go on. This situation uses the exact same XMLSchema
as situation2, only it does not declare a 'prefixed' namespace, just
a 'default' namespace.<br>
<br>
<doc xmlns="myns"<br>
xmlns:xsi=<a class="moz-txt-link-rfc2396E"
href="http://www.w3.org/2001/XMLSchema-instance">"http://www.w3.org/2001/XMLSchema-instance"</a><br>
xsi:schemaLocation=<a class="moz-txt-link-rfc2396E"
href="http://www.jdom.org/tests/default./SAXTestComplexMain.xsd">"myns
./MySchema.xsd"</a>><br>
<br>
In this case, we expect the default attribute 'att' to be added to
the schema, but in the 'myns' namespace. Unfortunately there is *NO*
declaration for that namespace which is 'prefixed'.<br>
<br>
The following is the *wrong* result (att attribute should be in
'myns' namespace!):<br>
<br>
<doc <font color="#cc0000"><b>att="defval"</b></font><br>
xmlns="myns"<br>
xmlns:xsi=<a class="moz-txt-link-rfc2396E"
href="http://www.w3.org/2001/XMLSchema-instance">"http://www.w3.org/2001/XMLSchema-instance"</a><br>
xsi:schemaLocation=<a class="moz-txt-link-rfc2396E"
href="http://www.jdom.org/tests/default./SAXTestComplexMain.xsd">"myns
./MySchema.xsd"</a>><br>
<br>
The real question is 'What is the correct result'....? Is the
following 'correct' ?<br>
<br>
<doc <font color="#cc0000"><b>attns0:att="defval"
xmlns:attns0="myns"</b></font> <br>
xmlns="myns"<br>
xmlns:xsi=<a class="moz-txt-link-rfc2396E"
href="http://www.w3.org/2001/XMLSchema-instance">"http://www.w3.org/2001/XMLSchema-instance"</a><br>
xsi:schemaLocation=<a class="moz-txt-link-rfc2396E"
href="http://www.jdom.org/tests/default./SAXTestComplexMain.xsd">"myns
./MySchema.xsd"</a>><br>
<br>
<br>
Not sure how that affects Saxon, but it should be easy to get some
idea.<br>
<br>
Rolf<br>
<br>
<br>
<br>
On 10/08/2011 5:03 AM, Michael Kay wrote:
<blockquote cite="mid:4E424969.5070908@saxonica.com" type="cite">
<br>
<blockquote type="cite">You are right, it is extra work to
maintain these structures for a case that no one hit before. One
can find arguments for one or the other solution. The amount of
addition memory should be negligible but my patch introduced a
bit of work while parsing every document while you suggested
changes seems to produce more work in the rare case that...
<br>
</blockquote>
<br>
<br>
Forgive me butting in to a thread that I've only been skim-reading
until now. But I thought I would look at what Saxon does about
this problem.
<br>
<br>
Firstly, Saxon states in its documentation that it expects the
stream of ContentHandler events to correspond to those that come
from a parser that has been configured with namespaces="true" and
namespace-prefixes="false". It has no way of checking this in
general (though it does so on paths where it has access to the
XMLReader).
<br>
<br>
Saxon does a few checks on the consistency of the event stream
where these can be done cheaply. For example, it checks for the
attribute names "xmlns" and "xmlns:*" and ignores them if they
appear, even though they shouldn't appear in theory.
<br>
<br>
But there's one area Saxon relies on something that isn't
guaranteed by the SAX spec, namely it assumes that the QName will
be present and correct, even though it is optional when
namespace-prefixes="false". I made this decision because all known
XML parsers supply the QName, and because coping with its absence
would incur significant cost on a performance-critical path. I've
reasoned in the path that if someone needs to work with a source
of SAX events that doesn't supply the QName, a filter could be
added to the pipeline to make good the deficiency.
<br>
<br>
In this particular case, if I've understood the thread correctly,
the QName is present but doesn't contain a legitimate prefix. In
Saxon (where I'm sure the same sequence of SAX events might be
received) I think I would have similar problems in dealing with
this input. My response to a bug report on this would be that the
input is invalid according to the SAX spec and should be corrected
by inserting a filter: there is an implicit constraint that the
stream of SAX events represents a well-formed XML document, and in
a well-formed XML document, if an attribute is in a namespace then
it must have a prefix. I wouldn't be prepared to add a performance
penalty into the mainstream document building path in order to
detect or repair this rare anomaly.
<br>
<br>
Michael Kay
<br>
Saxonica
<br>
<br>
_______________________________________________
<br>
To control your jdom-interest membership:
<br>
<a class="moz-txt-link-freetext" href="http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com">http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com</a>
<br>
<br>
</blockquote>
<br>
</body>
</html>