Skip navigation links
JDOM
2.0.6.1

Package org.jdom2.input.sax

Support classes for building JDOM documents and content using SAX parsers.

See: Description

Package org.jdom2.input.sax Description

Support classes for building JDOM documents and content using SAX parsers.

Introduction

Skip to the Examples section for a quick bootstrap.

The SAXBuilder class parses input and produces JDOM output. It does this using three 'pillars' of functionality, which when combined constitute a 'parse'.

The three pillars are:

  1. The SAX Parser - this is a 'third-party' parser such as Xerces.
  2. The SAX Event Handler - which reads the data produced by the parser
  3. The JDOMFactory - which converts the resulting data in to JDOM content
There are many different ways of parsing the document from its input state (DocType-validating, etc.), and there are also different ways to interpret the SAX events. Finally there are different ways to produce JDOM Content using different implementations of the JDOMFactory.

SAXBuilder provides a central location where these three pillars are configured. Some configuration settings require coordinated changes to both the SAX parser and the SAX handler, and SAXBuilder ensures the coordination is maintained.

Setting the Pillars

SAXBuilder provides a number of different mechanisms for stipulating what the three pillars will be:

The XMLReaderJDOMFactory Pillar

A brief history of XML Parsers in Java:
XML Parsers have been available in Java from essentially 'the beginning'. There have been a number different ways to access these parsers though:
  1. Create the parser directly 'by name'.
  2. Use the SAX (and later the SAX 2.0) API to locate a parser.
  3. Use JAXP (versions 1, through 1.4) API to locate a parser.

In addition to the different ways of creating an XML parser, there have also been updates to the way the actual SAX parsing API is exposed to Java (the Java interface). The SAX specification was revised with version 2.0. The 'new' SAX version introduced the XMLReader concept, which replaces the XMLParser concept. These two concepts aim to accomplish the same goal, but do it in different ways.

JDOM 2.x requires an XMLReader (SAX 2.0) interface, thus your XML parser needs to be compatible with SAX 2.0 (for the XMLReader), but should be accessible through JAXP which is the more modern and flexible access system.

The purpose of the XMLReaderJDOMFactory Pillar is to give the SAXBuilder an XMLReader instance (a SAX 2.0 parser). To get an XMLReader the SAXBuilder delegates to the XMLReaderJDOMFactory by calling XMLReaderJDOMFactory.createXMLReader()

XMLReader instances can be created in a few different ways, and also they can be set to perform the SAX parse in a number of different ways. The classes in this package are designed to make it easier and faster to locate the XMLReader that is suitable for the XML parsing you intend to do. At the same time, if the parsing you intend to do is outside the normal bounds of how JDOM is used, you still have the functionality to create a completely custom mechanism for setting the XMLReader for SAXBuilder.

There are two typical ways to specify and create an XMLReader instance: using JAXP, and using the SAX2.0 API. If necessary you can also create direct instances of XMLReader implementations using 'new' constructors, but each SAX implementation has different class names for their SAX drivers so doing raw constructors is not portable and not recommended.

Where possible it is recommended that you use the JAXP mechanism for obtaining XMLReaders because:

JAXP Factories

JDOM exposes six factories that use JAXP to source XMLReaders. These factories cover almost all conditions under which you would want a SAX parser:
  1. A simple non-validating SAX parser
  2. A validating parser that uses the DOCTYPE references in the XML to validate against.
  3. A validating parser that uses the XML Schema (XSD) references embedded in the XML to validate against.
  4. A factory that uses a specific JAXP-based parser that can optionally validate using the DTD DocType.
  5. A validating parser that uses an external Schema (XML Schema, Relax NG, etc.) to validate the XML against.
  6. A special case of the Schema-validating factory that specialises in XML Schema (XSD) validation and provides an easy way to create validating XMLReaders based on single or multiple input XSD documents.
The first three are all relatively simple, and are available as members of the XMLReaders enumeration. These members are 'singletons' that can be used in a multi-threaded and concurrent way to provide XMLReaders that are configured correctly for the respective behaviour.

To parse with a specific (rather than the default) JAXP-based XML Parser you can use the XMLReaderJAXPFactory. This factory can optionally be set to do DTD validation during the parse.

To validate using an arbitrary external Schema you can use the XMLReaderSchemaFactory to create an instance for the particular Schema you want to validate against. Because this requires an input Schema it cannot be constructed as a singleton like the others. There are constructors that allow you to use a specific (rather than the default) JAXP-compatible parser.

XMLReaderXSDFactory is a special case of XMLReaderSchemaFactory which internally uses an efficient mechanism to compile Schema instances from one or many input XSD documents which can come from multiple sources. There are constructors that allow you to use a specific (rather than the default) JAXP-compatible parser.

SAX 2.0 Factory

JDOM supports using the SAX 2.0 API for creating XMLReaders through using either the 'default' SAX 2.0 implementation or a particular SAX Driver class. SAX2.0 support is available by creating instances of the XMLReaderSAX2Factory class.

It should be noted that it is preferable to use JAXP in JDOM because it is a more flexible API that allows more portable code to be created. The JAXP interface in JDOM is also able to support a wider array of functionality out-of-the-box, but the same functionality would require SAX-implementation specific configuration.

JDOM does not provide a pre-configured way to do XML Schema validation through the SAX2.0 API though. The SAX 2.0 API does not expose a convenient way to configure different SAX implementations in a consistent way, so it is up to the JDOM user to wrap the XMLReaderSAX2Factory in such a way that it reconfigures the XMLReader to be appropriate for the task at hand.

Custom Factories

If your circumstances require it you can create your own implementation of the XMLReaderJDOMFactory to provide XMLReaders configured as you like them. It will probably be best if you wrap an existing implementation with your custom code though in order to get the best results fastest.

Note that the existing JDOM implementations described above all set the generated XMLReaders to be namespace-aware and to supply namespace-prefixes. Custom implementations should also ensure that this is set unless you absolutely know what you are doing.

The SAXHandlerFactory Pillar

The SAXHandler interprets the SAX calls and provides the information to the JDOMFactory to create JDOM content. SAXBuilder creates a SAXHandler from the SAXHandlerFactory pillar. It is unusual for a JDOM user to need to customise the manner in which this happens, but, in the event that you do you can create a subclass of the SAXHandler class, and then create an instance of the SAXHandlerFactory that returns new subclass instances. This new factory can become a pillar in SAXBuilder and supply custom SAXHandlers to the parse process.

The JDOMFactory Pillar

There are a couple of reasons for changing the JDOMFactory pillar in SAXBuilder. The default JDOMFactory used is the DefaultJDOMFactory. This factory validates the values being used to create JDOM content. There is also the UncheckedJDOMFactory which does not validate the data, so it should only be used if you are absolutely certain that your SAX source can never provide illegal content. You may have other reasons for creating a custom JDOMFactory such as if you need to create custom versions of JDOM Content like a custom Element subclass.

Configuring the Pillars

The JDOMFactory pillar is not configurable; you can only replace it entirely. The other two pillars are configurable though, but you should inspect the getters and setters on SAXBuilder to identify what can (by default) be changed easily. Remember, if you have anything that needs to be customised beyond what SAXBuilder offers you can always replace a pillar with a custom implementation.

Execution Model

Once all the pillars are set and configured to your satisfaction you can 'build' a JDOM Document from a source. The actual parse process consists of a 'setup', 'parse', and 'reset' phase.

The setup process involves obtaining an XMLReader from the XMLReaderJDOMFactory and a SAXHandler (configured to use the JDOMFactory) from the SAXHandlerFactory. These two instances are then configured to meet the settings specified on SAXBuilder, and once configured they are 'compiled' in to a SAXBuilderEngine.

The SAXBuilderEngine is a non-configurable 'embodiment' of the configuration of the SAXBuilder when the engine was created, and it contains the entire 'workflow' necessary to parse the input in to JDOM content. Further, it is a guarantee that the XMLReader and SAXHandler instances in the SAXBuilderEngine are never shared with any other engine or entity (assuming that the respective factories never issue the same instances multiple times). There is no guarantee made for the JDOMFactory being unique for each SAXBuilderEngine, but JDOMFactory instances are supposed to be reentrant/thread-safe.

The 'parse' phase starts once the setup phase is complete and the SAXBuilderEngine has been created. The created engine is used to parse the input, and the resulting Document is returned to the client.

The 'reset' phase happens after the completion of the 'parse' phase, and it resets the SAXBuilderEngine to its initial state, ready to process the next parse request.

Parser Reuse

A large amount of the effort involved in parsing the document is actually the creation of the XMLReader and the SAXHandler instances, as well as applying the configuration to those instances (the 'setup' phase).

JDOM2 uses the new SAXBuilderEngine to represent the state of the SAXBuilder at the moment prior to the parse. SAXBuilder will then 'remember' and reuse this exact SAXBuilderEngine until something changes in the SAXBuilder configuration. As soon as the configuration changes in any way the engine will be forgotten and a new one will be created when the SAXBuilder next parses a document.

If you turn off parser reuse with SAXBuilder.setReuseParser(boolean) then SAXBuilder will immediately forget the engine, and it will also forget it after each build (i.e. SAXBuilder will create a new SAXBuilderEngine each parse).

It follows then that as long as you do not change the SAXBuilder configuration then the SAXBuilder will always reuse the same SAXBuilderEngine. This is very efficient because there is no configuration management between parses, and the procedure completely eliminates the 'setup' component for all but the first parse.

Parser Pooling

In order to facilitate Parser pooling it is useful to export the SAXBuilderEngine as a stand-alone reusable parser. At any time you can call SAXBuilder.buildEngine() and you can get a newly created SAXBuilderEngine instance. The SAXBuilderEngine has the same 'build' methods as SAXBuilder, and these are exposed as the SAXEngine interface. Both SAXBuilder and SAXBuilderEngine implement the SAXEngine interface. Thus, if you use Parser pools you can pool either the SAXBuilder or the SAXBuilderEngine in the same pool.

It is most likely though that what you will want to do is to create a single SAXBuilder that represents the configuration you want, and then you can use this single SAXBuilder to create multiple SAXEngines as you need them in the pool by calling the buildEngine() method.

Examples

Create a simple SAXBuilder and parse a document:

 SAXBuilder sb = new SAXBuilder();
 Document doc = sb.build(new File("file.xml"));
 

Create a DTD validating SAXBuilder and parse a document:

 SAXBuilder sb = new SAXBuilder(XMLReaders.DTDVALIDATING);
 Document doc = sb.build(new File("file.xml"));
 
Create an XSD (XML Schema) validating SAXBuilder using the XSD references inside the XML document and parse a document:

 SAXBuilder sb = new SAXBuilder(XMLReaders.XSDVALIDATING);
 Document doc = sb.build(new File("file.xml"));
 

Create an XSD (XML Schema) validating SAXBuilder the hard way (see the next example for an easier way) using an external XSD and parse a document (see XMLReaderSchemaFactory):

 SchemaFactory schemafac =
 SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
 Schema schema = schemafac.newSchema(new File("myschema.xsd"));
 XMLReaderJDOMFactory factory = new XMLReaderSchemaFactory(schema);
 SAXBuilder sb = new SAXBuilder(factory);
 Document doc = sb.build(new File("file.xml"));
 

Create an XSD (XML Schema) validating SAXBuilder the easy way (see XMLReaderXSDFactory):

 File xsdfile = new File("myschema.xsd");
 XMLReaderJDOMFactory factory = new XMLReaderXSDFactory(xsdfile);
 SAXBuilder sb = new SAXBuilder(factory);
 Document doc = sb.build(new File("file.xml"));
 
Skip navigation links
JDOM
2.0.6.1

Copyright © 2021 Jason Hunter, Brett McLaughlin. All Rights Reserved.