[jdom-interest] Thread questions regarding JDOM SAXBuiler?

Per Norrman per.norrman at austers.se
Mon Aug 30 15:58:27 PDT 2004


Hi,

I had a test program lying around that was fairly easy to
adopt to make an unscientific measurement of the "cost" of
allocating new SAXBuilders/XMLReaders vs reusing them. This program measures
total parse time for five threads parsing 20 identical XML sources,
i.e. 4 each. Each test run is set up with a different "file" size,
from ~20Kb up to over 0.5Mb.

I was surprised! As expected, the cost (in speed) of allocating a new
SAXBuilder/XMLReader shows up when parsing small files, but, rather quickly,
something else kicks in and reverses that situation, so that for sufficiently
large "files", you loose speed when reusing the parser. Telling exactly, or even
approximately, where the lines cross each other is most certainly impossible in
the general case.

So, go back and read up on the Strategey pattern and make it swappable at run
time for optimal tuning ;-)

/pmn


David Wall wrote:
> The docs clearly say that JDOM is not threaded and that thread safety comes
> from our code.  That's fine.
> 
> But can anybody tell me if the cost of instantiating a SAXBuilder object is
> considered "expensive" or not?  In other words, should I have a pool of
> SAXBuilder objects if I plan on doing a lot of XML parsing (which is already
> expensive, though most of our docs are quite small -- such as configuration
> data) or are these objects lightweight enough to just instantiate, use and
> throw away?
> 
> Thanks,
> David
> 
> ----- Original Message ----- 
> From: "David Wall" <d.wall at computer.org>
> To: <jdom-interest at jdom.org>
> Sent: Sunday, August 29, 2004 3:32 PM
> Subject: [jdom-interest] Thread questions regarding JDOM SAXBuiler?
> 
> 
> 
>>If I expect to do a lot of parsing with JDOM, it seems that I might want
> 
> to
> 
>>create a pool of SAXBuilder objects to avoid the overhead of loading the
>>parser.  Is that true, or is the overhead of creating one quite small?
>>
>>Is a SAXBuilder thread-safe, or should I only be calling the
>>SAXBuilder.build() method in a single thread at a time for a given
>>SAXBuilder?
>>
>>And I presume it's okay to reuse a SAXBuilder object by having many
>>different threads build() documents over time without any issues that one
>>XML parsing would affect the other (assuming they have the same options,
>>like ignore whitespace, validation, etc.).
>>
>>Is that the case?  If I do a lot of parsing (at least once per HTTP
>>request), should I use a pool of SAXBuilder objects for this purpose, or
> 
> is
> 
>>the overhead small enough that I can just create a new SAXBuilder whenever
> 
> I
> 
>>want one?
>>
>>Thanks,
>>David
>>
>>_______________________________________________
>>To control your jdom-interest membership:
>>http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
> 
> 
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
> 

-------------- next part --------------
package large;

import java.io.StringReader;
import java.text.DateFormat;
import java.text.DateFormatSymbols;
import java.util.Calendar;
import java.util.Date;

import org.jdom.Comment;
import org.jdom.Document;
import org.jdom.Element;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;
import org.xml.sax.InputSource;

import EDU.oswego.cs.dl.util.concurrent.LinkedQueue;

/**
 * @author Per Norrman
 *  
 */
public class ThreadedReader {
    private boolean _reuse = true;

    private String _xml = "";

    LinkedQueue _queue = new LinkedQueue();

    private long _time = 0;

    public ThreadedReader(boolean reuse) {
        _reuse = reuse;
    }

    public synchronized void addTime(long elapsed) {
        _time += elapsed;
    }

    public synchronized long getTime() {
        return _time;
    }

    public void reset() {
        _time = 0;
    }

    public void process(String start, String end, int count) throws Exception {
        reset();
        generate(start, end);
        // fill work queue
        for (int i = 1; i <= count; ++i) {
            _queue.put(new InputSource(new StringReader(_xml)));
        }

        // create threads
        Thread[] thread = new Thread[5];
        for (int i = 0; i < 5; ++i) {
            thread[i] = new ReaderThread(_reuse);
            thread[i].start();
        }

        // make them stop
        for (int i = 0; i < 5; ++i) {
            _queue.put(new Object());
        }

        for (int i = 0; i < 5; ++i) {
            thread[i].join();
        }

        // report
        System.out.println("Reuse=" + _reuse + "\tsize=" + _xml.length()
                + "\ttime: " + getTime());
    }

    public void generate(String startDate, String endDate) throws Exception {
        DateFormat df = DateFormat.getDateInstance(DateFormat.SHORT);
        DateFormatSymbols dfs = new DateFormatSymbols();
        String[] weekDays = dfs.getWeekdays();

        Element root = new Element("root");
        Document doc = new Document(root);
        doc.getContent().add(0,
                new Comment(" Generated: " + df.format(new Date()) + " "));

        Calendar cal = Calendar.getInstance();
        Date start = df.parse(startDate);
        Date end = df.parse(endDate);

        cal.setTime(start);
        while (cal.getTime().before(end)) {
            Element date = new Element("day");
            date.addContent(new Element("date").setText(df
                    .format(cal.getTime())));
            root.addContent(date);
            String weekDay = weekDays[cal.get(Calendar.DAY_OF_WEEK)];
            Element day = new Element("dayname").setText(weekDay);
            date.addContent(day);
            cal.add(Calendar.DATE, 1);
        }

        XMLOutputter out = new XMLOutputter();

        _xml = out.outputString(doc);

    }

    public static void test(String start, String end) throws Exception {
        new ThreadedReader(true).process(start, end, 20);
        new ThreadedReader(false).process(start, end, 20);
    }

    public static void main(String[] args) throws Exception {
        test("2000-01-01", "2001-01-01");
        test("2000-01-01", "2001-01-01");
        test("2000-01-01", "2001-12-31");
        test("1990-01-01", "2004-12-31");
        test("1970-01-01", "2004-12-31");
    }

    private class ReaderThread extends Thread {
        private boolean _reuse = true;

        private SAXBuilder _builder = new SAXBuilder();

        public ReaderThread(boolean reuse) {
            _reuse = reuse;
            _builder.setReuseParser(reuse);
        }

        private void parse(InputSource source) {
            long elapsed = 0;
            try {
                elapsed = System.currentTimeMillis();
                if (_reuse) {
                    _builder.build(source);
                } else {
                    SAXBuilder builder = new SAXBuilder();
                    _builder.build(source);
                }
                elapsed = System.currentTimeMillis() - elapsed;
                addTime(elapsed);
            } catch (Exception e) {
                System.out.println(getName() + ": " + e.getMessage());
            }
        }

        public void run() {
            try {
                while (true) {
                    Object thing = _queue.take();
                    if (thing instanceof InputSource) {
                        parse((InputSource) thing);
                    } else {
                        break;
                    }
                }
            } catch (InterruptedException e) {
                System.out.println(getName() + ": " + e.getMessage());
            }
        }
    }

}


More information about the jdom-interest mailing list