[jdom-interest] Which API to use? JDOM? SAX? PULL?

Duffey, Kevin KDuffey at BUYMEDIA.com
Sat Jul 20 10:14:06 PDT 2002


Hi all,

Well, I was originally using the xerces dom parser, reading in the doc,
parsing the elements, etc. This was a lengthy and code ugly process. I
switch to JDOM which still required the document to be fully put into
memory before anything could be done with it, but was far nicer on the
code, which to me is most important next to the memory issue.

A problem I found out, which many of you may run in to, is that it
appears you may need anywhere from 2x to 8x (or perhaps even more)
memory for the size of the xml document. My problem started when one of
my xml documents is 13MB. I got an OutOfMemory exception. After a few
posts here, some replied letting me know that it is quite normal for
larger documents to cause such problems. I take it in most cases
documents are very small which is why a lot of people can get away with
using JDOM, DOM and so on. So I started looking into using SAX2, and the
code was just as ugly as using the normal DOM parsing, only it was a
sort of different way of handling as well, with a lot of if() checks.
Even with a SAX2 framework, it was still a lot of code.

So I happened across an article
(http://www.javaworld.com/javaworld/jw-02-2002/jw-0208-xmljava.html). It
was a 3-part about using SAX or the latest pull parser technology. So as
you get to the end of Part 2, and all of part 3, it explains the pull
parser technology, why its as efficient and as fast as SAX2, but easier
to use. Anyway, I invite you all to read the 3 parts. Very interesting
indeed. I don't want to take away from JDOM, I think JDOM is a terrific
API. But if you need to work with larger xml files, or your app may have
a chance of running in to them at some point, I advise you to reconsider
using JDOM for a SAX2 or far better, a PullParser implementation while
you still can! Reworking my jdom code into the PullWrapper framework
this 3-part article introduces is a breeze! Took very little time. Even
better, the latest XmlPullAPI is implemented in two current pull
parsers. One is mxp1 (or xpp3) which the document at the end offers urls
to for you to download. The other is kxml (I think that is it, again,
read the doc at the end, it will have it there). The latest mxp1 is 36Kb
in size for the "full" version, and a minimal version of just 3 classes
is only 20Kb in size. The kxml versio is apparently less than 10Kb in
size. Both adhere to the XmlPullAPI (www.xmlpull.org for the standard)
and while they don't support the full set of capabilities that xerces
and the likes do, they are plenty good for parsing xml files including
those with name spaces. So unless you need a fully validating parser
(which you can use a small SAX parser to validate the xml, then use the
XmlPullParser of your choice to parse the xml), I would imagine the ease
of use, low memory requirements, and very fast speed of the latest mxp1
parser would more than meet the needs of any parsing job. I have no
doubt there are some circumstances where it may not be applicable. But
give it a look through anyway, see if it fits the bill for your next
parsing jobs.

Hope this may help some.
 



More information about the jdom-interest mailing list