[jdom-interest] Re: Reading XML with JDOM

David Wall david.wall at myeastside.com
Fri Oct 3 09:33:07 PDT 2008


Michael has it right -- penny-wise and pound foolish won't result in a 
good performing system.

If your XML is really that simple, maybe XML parsing it is not even the 
right solution, though SAX would surely do well.  Much depends on how 
much data transformation is needed.  Your firstname, lastname and SSN 
fields likely don't have to be encoded elements, so a simple string 
parsing may do much better, such as searching for "<lastname>" and the 
pulling the data until you find "</".

XML parsers are very general, so they are very useful.  But if your data 
is simple, you may find simple string parsing yourself to be the fastest.

David


>
> Michael Kay wrote:
>> The point of my message is that if it's taking 1 second to get the 
>> file over the network and 1msec to process the file, then improving 
>> the processing speed to 0.9 msec is a waste of effort. It's like 
>> tuning your car's engine and leaving the handbrake on. You need to 
>> understand the overall system performance (and the extent to which it 
>> falls short of the performance requirements) before you decide which 
>> parts of it to tune.
>>  
>> Michael Kay
>> http://www.saxonica.com/
>>
>>     ------------------------------------------------------------------------
>>     *From:* Praveen Gattu [mailto:pgattu at gmail.com]
>>     *Sent:* 03 October 2008 01:32
>>     *To:* Michael Kay
>>     *Cc:* Paul Libbrecht; jdom-interest at jdom.org
>>     *Subject:* Re: [jdom-interest] Re: Reading XML with JDOM
>>
>>     Michael
>>
>>     The XML is to be obtained over a www URL. Our networks are T1
>>     speed and performing their best. There isn't a problem with the
>>     network latency, but I acknowledge that retrieving the XML file
>>     over a www URL is probably the most time consuming procedure for
>>     my application. Www Network latency aside, I want to assure that
>>     whatever APIs/frameworks I use for parsing the XML are the
>>     fastest. For the purposes of measuring the performance of the
>>     parser, I am using a XML file located in the file system and
>>     removing the network latency aspect. Is this a valid method to
>>     measure the parser's performance?
>>
>>     -- Praveen
>>
>>     On Thu, Oct 2, 2008 at 3:49 PM, Michael Kay <mike at saxonica.com
>>     <mailto:mike at saxonica.com>> wrote:
>>
>>         You're asking about how to read the data efficiently with
>>         JDOM, but I suspect that if you are looking for performance
>>         then you might be looking in the wrong part of your system.
>>          
>>         XML parser start-up costs can be very high; initializing the
>>         parser for each document could easily turn out to be the
>>         dominant cost in this application. You can get a lot of
>>         saving by reusing parser instances. I don't know what JDOM's
>>         initialization costs for building a document are, but you
>>         need to check them too.
>>          
>>         At any rate, building a JDOM tree almost certainly takes
>>         longer than extracting the data from the tree once built.
>>          
>>         I find this statement a bit worrying:
>>          
>>         >Anyway, since there is probably not much I can do with the
>>         network latency, I am trying to keep the Java code as skinny
>>         and efficient as possible.
>>         That seems to be an inversion of the way performance
>>         engineering should be done. If network latency is the
>>         dominant cost, then effort spent on tuning your Java code is
>>         a total waste of time.
>>          
>>         I would focus your attention on measuring performance, end to
>>         end, before you start tuning anything.
>>          
>>         Michael Kay
>>         http://www.saxonica.com/
>>
>>             ------------------------------------------------------------------------
>>             *From:* jdom-interest-bounces at jdom.org
>>             <mailto:jdom-interest-bounces at jdom.org>
>>             [mailto:jdom-interest-bounces at jdom.org
>>             <mailto:jdom-interest-bounces at jdom.org>] *On Behalf Of
>>             *Praveen Gattu
>>             *Sent:* 02 October 2008 23:04
>>             *To:* Paul Libbrecht
>>             *Cc:* jdom-interest at jdom.org <mailto:jdom-interest at jdom.org>
>>             *Subject:* Re: [jdom-interest] Re: Reading XML with JDOM
>>
>>             Paul,
>>
>>             Thanks for the response. My XML is really as simple as
>>             the one I posted. The 8,500 documents are retrieved over
>>             a HTTP URL. So add network latency, which makes it longer
>>             than a minute, unless my XML parser is extremely fast.
>>             Anyway, since there is probably not much I can do with
>>             the network latency, I am trying to keep the Java code as
>>             skinny and efficient as possible.
>>
>>             Would you be able to provide sample code for the solution
>>             you suggested?
>>
>>             On Thu, Oct 2, 2008 at 2:27 PM, Paul Libbrecht
>>             <paul at activemath.org <mailto:paul at activemath.org>> wrote:
>>
>>                 Praveen,
>>
>>                 in jdom you would just parse then take the root, then
>>                 the employee, the extract last-name and ssn.
>>                 It is ignoring from the point of view of your
>>                 programme but not from the point of view of parsing.
>>
>>                 Where you can save is by changing the xml
>>                 technology... if your document is as simple as below
>>                 then using sax has greater performance guarantees
>>                 (you really cannot go faster) but is harder to
>>                 programme with.
>>                 Another part where jdom can take too much of your CPU
>>                 is if this document has loads of other stuffs.
>>
>>                 Where JDOM would make a positive difference is at
>>                 walking more elaborate xml documents, which is the
>>                 norm, and at manipulating them. The expressivity of
>>                 the library there is unbeatable to my taste.
>>
>>                 However, your requirements sound easy: 8500 such
>>                 documents per minutes?
>>                 JDOM does this probably ten times, multithreadedness
>>                 not being really necessary.
>>
>>                 paul
>>
>>
>>                 On 02-oct.-08, at 20:29, Praveen Gattu wrote:
>>
>>                     I have a XML as below. There is always "only one"
>>                     employee node in the XML. So rather than
>>                     iterating through the nodes, I want to read the
>>                     lastname and ssn directly, while ignoring the
>>                     firstname. What is the best way to do this in
>>                     JDOM? My most important criteria is speed. We
>>                     will be processing about a 8,500 of such XML
>>                     documents per minute  (multi-threaded of course)
>>                     and need something efficient and fast. I
>>                     appreciate any help you can offer in this regard.
>>
>>                     <response>
>>                      <employee>
>>                        <firstname>John</firstname>
>>                        <lastname>Smith</lastname>
>>                        <ssn>111-11-1111</ssn>
>>                      </employee>
>>                     </response>
>>
>>                     -- 
>>                     Thanks,
>>                     Praveen
>>
>>
>>
>>
>>                     -- 
>>                     Thanks,
>>                     Praveen
>>                     _______________________________________________
>>                     To control your jdom-interest membership:
>>                     http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>>                     <mailto:youraddr at yourhost.com>
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> To control your jdom-interest membership:
>> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com

-- 
David A. E. Wall
724 17th Avenue
Kirkland, WA 98033-4206
Tel 425.822.8135    

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.jdom.org/pipermail/jdom-interest/attachments/20081003/75875aba/attachment.htm


More information about the jdom-interest mailing list