[jdom-interest] Is JDOM schema checking when it shouldn't be?

Elliotte Rusty Harold elharo at metalab.unc.edu
Wed Jun 14 15:38:15 PDT 2000


Here's a weird one I encountered while trying to track down a bug. It's
almost certainly a problem inherited from the xerces.jar JDOM bundles.
Consider this simple well-formed but invalid document:

<test xmlns="http://www.jdom.org/">
</test>

When I tried to parse this with a SAXBuilder, JDOM actually attempted to
connect to http://www.jdom.org/ and parse the document it found there.
Naturally, since that document is HTML and not XML I got errors:

D:\speaking\xmldevcon\jdom\examples>java Validator test.xml
[Error] :1:7: Element type "html" must be declared.
[Error] :2:7: Element type "head" must be declared.
[Error] :3:8: Element type "title" must be declared.
[Error] :4:18: Attribute "http-equiv" must be declared for element type
"meta".
[Error] :4:41: Attribute "content" must be declared for element type
"meta".
[Error] :4:73: Element type "meta" must be declared.
[Fatal Error] :5:7: The element type "meta" must be terminated by the
matching end-tag "</meta>".
test.xml is not valid.
null: null

There's no reason for the parser to try to download the document at a
namespace URI, near as I can figure, unless perhaps it's some weird
Xerces behavior with regard to schemas.

Here's the class that tried to parse the file:

import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;


public class Validator {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java Validator URL1 URL2..."); 
    } 
      
    SAXBuilder builder = new SAXBuilder(true);
                             /*         ^^^^       */
                             /* Turn on validation */
    // start parsing... 
    // start parsing... 
    for (int i = 0; i < args.length; i++) {
      
      // command line should offer URIs or file names
      try {
          builder.build(args[i]);
        // If there are no well-formedness errors, 
        // then no exception is thrown
        System.out.println(args[i] + " is well formed.");
      }
      catch (JDOMException e) { // indicates an error
        System.out.println(args[i] + " is not valid.");
        System.out.println(e.getMessage());
      }
      
    }   
  
  }

} 

I'm still trying to track down the details, but using the default
namespace on the root element seems to be a fruitful source of bugs. 

This occurs with Xerces 1.0.3 and with whichever version of Xerces is
distributed with JDOMb4. Upgrading to Xerces 1.1.0 fixes the problem. 


-- 
+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+ 
|               Java I/O (O'Reilly & Associates, 1999)               |
|            http://metalab.unc.edu/javafaq/books/javaio/            |
|   http://www.amazon.com/exec/obidos/ISBN=1565924851/cafeaulaitA/   |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://metalab.unc.edu/javafaq/ | 
|  Read Cafe con Leche for XML News: http://metalab.unc.edu/xml/     |
+----------------------------------+---------------------------------+



More information about the jdom-interest mailing list