<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=us-ascii"><meta name=Generator content="Microsoft Word 14 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";
color:black;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0in;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Courier New";
color:black;}
span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:"Consolas","serif";
color:black;}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body bgcolor=white lang=EN-US link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Oracle even supports an xquery function named “XMLQUERY” – so in the SELECT clause you can put something like this in:<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> , XMLQUERY(<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> 'for $foo in //tag/@name<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> where $tag = $tagdata/@name <o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> return $ tagdata’<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> PASSING … as "colname" RETURNING CONTENT<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> )<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><div><div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in'><p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif";color:windowtext'>From:</span></b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif";color:windowtext'> jdom-interest-bounces@jdom.org [mailto:jdom-interest-bounces@jdom.org] <b>On Behalf Of </b>Rolf Lear<br><b>Sent:</b> Wednesday, January 04, 2012 2:54 PM<br><b>To:</b> jdom-interest@jdom.org<br><b>Subject:</b> Re: [jdom-interest] XML Schema classification help<o:p></o:p></span></p></div></div><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Hi Cliff.<br><br>I can't think of any magic 'short cut'.... and certainly, I do not think JDOM will be the fastest/best way to 'classify' each document.<br><br>Things you should consider though:<br>- Using a plain SAX Parser (xmlreader) with a clever 'Entity Resolver' may help you to quickly access what external URL's (probably XML Schemas) are needed to resolve the document (although there is no concept of an 'order' of schemas). This could help 'identify' the document.<br>- Cutting short the parser (throw a SAX exception) would speed things up once you have entered the main part of the document (startElement()) because you probably do not need to parse the whole document, just the xsi schema-location references.<br>- Finally, depending on your database, you may already have a JRE available in the database server ('big-brand databases mostly already do, like DB2, Oracle, Sybase, etc.), in which case you can build a 'clever' Java function that evaluates the document *inside* the database, and avoid creating a lot of external traffic.... for example, you may be able to create a custom java-backed function 'xmlschemas()' which returns the list of schemas in use in a document, and then you can do something like:<br><br>select xmlschemas(xmldatacol) as schemas, count(*) from table group by schemas<br><br>Rolf<br><br>On 04/01/2012 2:11 PM, cliff palmer wrote: <o:p></o:p></p><p class=MsoNormal>I need to examine XML documents contained in multiple columns in a database table with over a million rows and identify each of the different structures used for the XML data, producing a count if the number of instances that use each structure.<br><br>I thought of using the SAXParser then creating a list of the XML headers in the order used and storing each unique list and accumulating a count based on matching an already encountered list object, but I am hoping there is a less cumbersome approach.<br><br>I would appreciate any and all suggestions.<br><br>Thanks!<br>Cliff<br><br><br><br><o:p></o:p></p><pre>_______________________________________________<o:p></o:p></pre><pre>To control your jdom-interest membership:<o:p></o:p></pre><pre><a href="http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com">http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com</a><o:p></o:p></pre><p class=MsoNormal><o:p> </o:p></p></div></body></html>