<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    Hi Cliff.<br>

    <br>

    JDOM cannot generate an XSD for a document (interesting idea, but

    very complicated.... like, how would it set maxOccurs? ... and in

    your use case that would be significant....)<br>

    <br>

    The best I can suggest is that you will need to do a 'deep

    inspection' of the XML, create your own sort of 'fingerprint' for

    the document, and then use that.<br>

    <br>

    JDOM could possibly be useful because it makes the inspection part a

    whole lot easier than building a SAX ContentHandler, etc (but at the

    price of some speed and some memory). Once you have built the JDOM

    document you can run all sorts of functions on the data to create

    the 'fingerprint'.<br>

    <br>

    Again, this could potentially be done inside the database to be more

    efficient.<br>

    <br>

    Unfortunately (for you), this is not something that I think there is

    an easy, or preexisting solution for (nothing comes to mind).<br>

    <br>

    Also, as Michael says, you need to build up your 'taxanomical' (nice

    word, Michael) rules, and in a 'real world' instance, you should be

    namespace aware, etc. Again, JDOM can help with that.... but only as

    a part of a bigger solution.<br>

    <br>

    Rolf<br>

    <br>

    <br>

    <br>

    If you need to do 'deep inspection' of the XML to determine it's <br>

    <br>

    On 04/01/2012 4:00 PM, cliff palmer wrote:

    <blockquote

cite="mid:CABhr9SvABYbapTjaRMpQQVVv6Yowkb0XyBmvWCiesFu7aFBVzg@mail.gmail.com"

      type="cite">Rolf it's a little more involved than reading the XSI

      refs.  I need to look at the nodes.<br>

      <br>

      <a> <b> <c> </c> </b> </a> and

      <a> <b> <c> </c> <c> </c>

      </b> </a> are considered the same structure for what I

      am doing because repeating nodes aren't considered a difference.<br>

      <a> <b> <c> </c> </b> </a> and

      <a> <b> <c> </c> </b> <b>

      <c> </c> <c> </c> </b> </a>

      are considered the same structure for what I am doing because

      repeating groups aren't considered a difference.<br>

      <a> <b> <c> </c> </b> </a> and

      <a> <b> </b> </a> are not the same because

      <c> is missing in the 2nd case so it does not contain all

      the elements as the 1st case.<br>

      <br>

      I had hoped (apparently it's just a hope) that JDOM could generate

      an XSD from an XML DOM object.<br>

      <br>

      More ideas are welcome :)<br>

      <br>

      Cliff<br>

      <br>

      <br>

      <div class="gmail_quote">On Wed, Jan 4, 2012 at 2:53 PM, Rolf Lear

        <span dir="ltr"><<a moz-do-not-send="true"

            href="mailto:jdom@tuis.net">jdom@tuis.net</a>></span>

        wrote:<br>

        <blockquote class="gmail_quote" style="margin:0pt 0pt 0pt

          0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

          <div bgcolor="#FFFFFF" text="#000000"> Hi Cliff.<br>

            <br>

            I can't think of any magic 'short cut'.... and certainly, I

            do not think JDOM will be the fastest/best way to 'classify'

            each document.<br>

            <br>

            Things you should consider though:<br>

            - Using a plain SAX Parser (xmlreader) with a clever 'Entity

            Resolver' may help you to quickly access what external URL's

            (probably XML Schemas) are needed to resolve the document

            (although there is no concept of an 'order' of schemas).

            This could help 'identify' the document.<br>

            - Cutting short the parser (throw a SAX exception) would

            speed things up once you have entered the main part of the

            document (startElement()) because you probably do not need

            to parse the whole document, just the xsi schema-location

            references.<br>

            - Finally, depending on your database, you may already have

            a JRE available in the database server ('big-brand databases

            mostly already do, like DB2, Oracle, Sybase, etc.), in which

            case you can build a 'clever' Java function that evaluates

            the document *inside* the database, and avoid creating a lot

            of external traffic.... for example, you may be able to

            create a custom java-backed function 'xmlschemas()' which

            returns the list of schemas in use in a document, and then

            you can do something like:<br>

            <br>

            select xmlschemas(xmldatacol) as schemas, count(*) from

            table group by schemas<br>

            <br>

            Rolf

            <div>

              <div class="h5"><br>

                <br>

                On 04/01/2012 2:11 PM, cliff palmer wrote: </div>

            </div>

            <blockquote type="cite">

              <div>

                <div class="h5">I need to examine XML documents

                  contained in multiple columns in a database table with

                  over a million rows and identify each of the different

                  structures used for the XML data, producing a count if

                  the number of instances that use each structure.<br>

                  <br>

                  I thought of using the SAXParser then creating a list

                  of the XML headers in the order used and storing each

                  unique list and accumulating a count based on matching

                  an already encountered list object, but I am hoping

                  there is a less cumbersome approach.<br>

                  <br>

                  I would appreciate any and all suggestions.<br>

                  <br>

                  Thanks!<br>

                  Cliff<br>

                  <br>

                  <fieldset></fieldset>

                  <br>

                </div>

              </div>

              <pre>_______________________________________________

To control your jdom-interest membership:

<a moz-do-not-send="true" href="http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com" target="_blank">http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com</a></pre>

            </blockquote>

            <br>

          </div>

          <br>

          _______________________________________________<br>

          To control your jdom-interest membership:<br>

          <a moz-do-not-send="true"

href="http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com"

            target="_blank">http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com</a><br>

        </blockquote>

      </div>

      <br>

    </blockquote>

    <br>

  </body>

</html>