Rolf it's a little more involved than reading the XSI refs. I need to look at the nodes.<br><br><a> <b> <c> </c> </b> </a> and <a> <b> <c> </c> <c> </c> </b> </a> are considered the same structure for what I am doing because repeating nodes aren't considered a difference.<br>
<a> <b> <c> </c> </b> </a> and
<a> <b> <c> </c> </b>
<b> <c> </c> <c> </c> </b> </a> are considered the same structure for what I am doing because repeating groups aren't considered a difference.<br><a> <b> <c> </c> </b> </a> and <a> <b> </b> </a> are not the same because <c> is missing in the 2nd case so it does not contain all the elements as the 1st case.<br>
<br>I had hoped (apparently it's just a hope) that JDOM could generate an XSD from an XML DOM object.<br><br>More ideas are welcome :)<br><br>Cliff<br><br><br><div class="gmail_quote">On Wed, Jan 4, 2012 at 2:53 PM, Rolf Lear <span dir="ltr"><<a href="mailto:jdom@tuis.net">jdom@tuis.net</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
Hi Cliff.<br>
<br>
I can't think of any magic 'short cut'.... and certainly, I do not
think JDOM will be the fastest/best way to 'classify' each document.<br>
<br>
Things you should consider though:<br>
- Using a plain SAX Parser (xmlreader) with a clever 'Entity
Resolver' may help you to quickly access what external URL's
(probably XML Schemas) are needed to resolve the document (although
there is no concept of an 'order' of schemas). This could help
'identify' the document.<br>
- Cutting short the parser (throw a SAX exception) would speed
things up once you have entered the main part of the document
(startElement()) because you probably do not need to parse the whole
document, just the xsi schema-location references.<br>
- Finally, depending on your database, you may already have a JRE
available in the database server ('big-brand databases mostly
already do, like DB2, Oracle, Sybase, etc.), in which case you can
build a 'clever' Java function that evaluates the document *inside*
the database, and avoid creating a lot of external traffic.... for
example, you may be able to create a custom java-backed function
'xmlschemas()' which returns the list of schemas in use in a
document, and then you can do something like:<br>
<br>
select xmlschemas(xmldatacol) as schemas, count(*) from table group
by schemas<br>
<br>
Rolf<div><div class="h5"><br>
<br>
On 04/01/2012 2:11 PM, cliff palmer wrote:
</div></div><blockquote type="cite"><div><div class="h5">I need to examine XML documents contained in multiple
columns in a database table with over a million rows and identify
each of the different structures used for the XML data, producing
a count if the number of instances that use each structure.<br>
<br>
I thought of using the SAXParser then creating a list of the XML
headers in the order used and storing each unique list and
accumulating a count based on matching an already encountered list
object, but I am hoping there is a less cumbersome approach.<br>
<br>
I would appreciate any and all suggestions.<br>
<br>
Thanks!<br>
Cliff<br>
<br>
<fieldset></fieldset>
<br>
</div></div><pre>_______________________________________________
To control your jdom-interest membership:
<a href="http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com" target="_blank">http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com</a></pre>
</blockquote>
<br>
</div>
<br>_______________________________________________<br>
To control your jdom-interest membership:<br>
<a href="http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com" target="_blank">http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com</a><br></blockquote></div><br>