[jdom-interest] Re: Reading XML with JDOM

Thu Oct 2 15:49:04 PDT 2008

You're asking about how to read the data efficiently with JDOM, but I
suspect that if you are looking for performance then you might be looking in
the wrong part of your system.

XML parser start-up costs can be very high; initializing the parser for each
document could easily turn out to be the dominant cost in this application.
You can get a lot of saving by reusing parser instances. I don't know what
JDOM's initialization costs for building a document are, but you need to
check them too.

At any rate, building a JDOM tree almost certainly takes longer than
extracting the data from the tree once built.

I find this statement a bit worrying:

>Anyway, since there is probably not much I can do with the network latency,
I am trying to keep the Java code as skinny and efficient as possible.

That seems to be an inversion of the way performance engineering should be
done. If network latency is the dominant cost, then effort spent on tuning
your Java code is a total waste of time.

I would focus your attention on measuring performance, end to end, before
you start tuning anything.

Michael Kay
http://www.saxonica.com/

  _____  

From: jdom-interest-bounces at jdom.org [mailto:jdom-interest-bounces at jdom.org]
On Behalf Of Praveen Gattu
Sent: 02 October 2008 23:04
To: Paul Libbrecht
Cc: jdom-interest at jdom.org
Subject: Re: [jdom-interest] Re: Reading XML with JDOM

Paul,

Thanks for the response. My XML is really as simple as the one I posted. The
8,500 documents are retrieved over a HTTP URL. So add network latency, which
makes it longer than a minute, unless my XML parser is extremely fast.
Anyway, since there is probably not much I can do with the network latency,
I am trying to keep the Java code as skinny and efficient as possible.

Would you be able to provide sample code for the solution you suggested?

On Thu, Oct 2, 2008 at 2:27 PM, Paul Libbrecht <paul at activemath.org> wrote:

Praveen,

in jdom you would just parse then take the root, then the employee, the
extract last-name and ssn.
It is ignoring from the point of view of your programme but not from the
point of view of parsing.

Where you can save is by changing the xml technology... if your document is
as simple as below then using sax has greater performance guarantees (you
really cannot go faster) but is harder to programme with.
Another part where jdom can take too much of your CPU is if this document
has loads of other stuffs.

Where JDOM would make a positive difference is at walking more elaborate xml
documents, which is the norm, and at manipulating them. The expressivity of
the library there is unbeatable to my taste.

However, your requirements sound easy: 8500 such documents per minutes?
JDOM does this probably ten times, multithreadedness not being really
necessary.

paul 

On 02-oct.-08, at 20:29, Praveen Gattu wrote:

I have a XML as below. There is always "only one" employee node in the XML.
So rather than iterating through the nodes, I want to read the lastname and
ssn directly, while ignoring the firstname. What is the best way to do this
in JDOM? My most important criteria is speed. We will be processing about a
8,500 of such XML documents per minute  (multi-threaded of course) and need
something efficient and fast. I appreciate any help you can offer in this
regard.

<response>
 <employee>
   <firstname>John</firstname>
   <lastname>Smith</lastname>
   <ssn>111-11-1111</ssn>
 </employee>
</response>

-- 
Thanks,
Praveen

-- 
Thanks,
Praveen

_______________________________________________
To control your jdom-interest membership:
http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.jdom.org/pipermail/jdom-interest/attachments/20081002/e7c35c8f/attachment.htm