Rolf it's a little more involved than reading the XSI refs.  I need to look at the nodes.<br><br><a> <b> <c> </c> </b> </a> and <a> <b> <c> </c> <c> </c> </b> </a> are considered the same structure for what I am doing because repeating nodes aren't considered a difference.<br>

<a> <b> <c> </c> </b> </a> and 

<a> <b> <c> </c> </b>

 <b> <c> </c> <c> </c> </b> </a> are considered the same structure for what I am doing because repeating groups  aren't considered a difference.<br><a> <b> <c> </c> </b> </a> and <a> <b> </b> </a> are not the same because <c> is missing in the 2nd case so it does not contain all the elements as the 1st case.<br>

<br>I had hoped (apparently it's just a hope) that JDOM could generate an XSD from an XML DOM object.<br><br>More ideas are welcome :)<br><br>Cliff<br><br><br><div class="gmail_quote">On Wed, Jan 4, 2012 at 2:53 PM, Rolf Lear <span dir="ltr"><<a href="mailto:jdom@tuis.net">jdom@tuis.net</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000">

    Hi Cliff.<br>

    <br>

    I can't think of any magic 'short cut'.... and certainly, I do not

    think JDOM will be the fastest/best way to 'classify' each document.<br>

    <br>

    Things you should consider though:<br>

    - Using a plain SAX Parser (xmlreader) with a clever 'Entity

    Resolver' may help you to quickly access what external URL's

    (probably XML Schemas) are needed to resolve the document (although

    there is no concept of an 'order' of schemas). This could help

    'identify' the document.<br>

    - Cutting short the parser (throw a SAX exception) would speed

    things up once you have entered the main part of the document

    (startElement()) because you probably do not need to parse the whole

    document, just the xsi schema-location references.<br>

    - Finally, depending on your database, you may already have a JRE

    available in the database server ('big-brand databases mostly

    already do, like DB2, Oracle, Sybase, etc.), in which case you can

    build a 'clever' Java function that evaluates the document *inside*

    the database, and avoid creating a lot of external traffic.... for

    example, you may be able to create a custom java-backed function

    'xmlschemas()' which returns the list of schemas in use in a

    document, and then you can do something like:<br>

    <br>

    select xmlschemas(xmldatacol) as schemas, count(*) from table group

    by schemas<br>

    <br>

    Rolf<div><div class="h5"><br>

    <br>

    On 04/01/2012 2:11 PM, cliff palmer wrote:

    </div></div><blockquote type="cite"><div><div class="h5">I need to examine XML documents contained in multiple

      columns in a database table with over a million rows and identify

      each of the different structures used for the XML data, producing

      a count if the number of instances that use each structure.<br>

      <br>

      I thought of using the SAXParser then creating a list of the XML

      headers in the order used and storing each unique list and

      accumulating a count based on matching an already encountered list

      object, but I am hoping there is a less cumbersome approach.<br>

      <br>

      I would appreciate any and all suggestions.<br>

      <br>

      Thanks!<br>

      Cliff<br>

      <br>

      <fieldset></fieldset>

      <br>

      </div></div><pre>_______________________________________________

To control your jdom-interest membership:

<a href="http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com" target="_blank">http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com</a></pre>

    </blockquote>

    <br>

  </div>

<br>_______________________________________________<br>

To control your jdom-interest membership:<br>

<a href="http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com" target="_blank">http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com</a><br></blockquote></div><br>