[jdom-interest] suggested JDOM2 improvements

Fri Jan 20 05:56:56 PST 2012

I have looked at the Saxon API, as well as the native Java API. I have
also looked in to XPath2.0.

Mostly my 'experience' with XPath is through the current JDOM API. There
are things I like, and things I dislike, and things I have had to relearn
because the JDOM/XPath API has skewed my experience.

I think I am settling on the following model:

1. deprecate the current XPath entirely. Keep it fully backward compatible
with JDOM 1.x
2. new JDOM2 XPathFactory concept which can have different implementation
back-ends (Jaxen, Saxon, whatever).
3. XPathFactories are thread-safe and reusable in any threads.
4. have a single 'default' XPathFactory instance obtainable with
XPathFactory.instance(). The default back-end instance() can be changed
with a system property.
5. the default 'default' back-end will continue to be Jaxen
6. Other back-ends can be used at will by calling the
XPathFactory.newInstance(String) method (or some direct constructor on the
Factory if it exposes one).
6. At the other end of the system will be an interface XPathCompiled<T>.
This will be immutable, but not thread-safe. Similar concept/behaviour to
javax.xml.xpath.XPathExpression.
7. XPathCompiled<T> will not have the 'special' valueOf, numberValue,
booleanValue that org.jdom.xpath.XPath has. These methods are extensions to
the basic XPath concept and make support for other types impossible (like
XPath 2.0).
8. Instead, XPathCompiled<T> has a generic type which will match the
result values from the expression. The generic type is set by the JDOM
Filter.
9. XPathCompiled<T> can return the full list of results, or alternatively
just the first result. The results will be type-cast to the specified
Filter.
10. The compiling and running methods for the new API will throw unchecked
exceptions (like the javax.xml.xpath.* API).

That will be the base model.

Using this model I expect a base (comprehensive) factory method:

public <E> XPathCompiled<E> compile(String xpath, Filter<E> filter,
Map<String,Object> variables, List<Namespace> namespaces);

In addition there will be variations on the compile method that cater for
simplified conditions, like the basic no-namespace, no-variable, no-filter:

public XPathCompiled<Object> compile(String xpath);

The XPathCompiled<T> class will have:

public List<T> evaluate(Object context);
public T evaluateFirst(Object context);

The evaluateFirst method is a convenience method that will be defined to
return the first value in the evaluate() results, or null if the result is
empty. Implementations can choose to have some short-circuit logic if
possible.

To make life easier it is helpful to have an intermediate class that can
manage the variable and namespace contexts for you. Thus a helper class
XPathBuilder<T> will support managing these (getters/setters for variables,
namespaces). It will also have a compile() method to create an
XPathCompiled<T> using the state of the XPathBuilder at compile time.

Since this new API will impose a 'Filter' on top of the XPath results
there may/will be times when debugging problems will be a challenge.. for
example: Am I missing element X because it was not selected by the XPath or
because it was eliminated by the filter? To answer that sort of question
there needs to be an XPathResult<T> object which contains the pre and post
filtered results (as well as other useful debugging information).

Thus, XPathCompiled<T> will also have:
public XPathResult<T> evaluateResult(Object context);

Examples of the way I see it working are:

//the following two are identical:
String name = XPathFactory.instance().compile("//name/text()",
Filters.string()).evaluateFirst(document);
String name = XPathFactory.instance().evaluateFirst(document,
"//name/text()", Filters.string());

// just select the current node.
Object val = XPathFactory.instance().evaluateFirst(context, "node()");

// create a builder and use it to compile an XPath.
XPathBuilder<Element> builder = new XPathBuilder(Filters.element());
builder.setXPath("//ns:*");
builder.addNamespace("ns", "http://example.com/mynamespace");
XPathCompiled<Element> xpath = builder.compile(XPathFactory.instance());
List<Element> mine = xpath.evaluate(mydocument);

// Get a diagnostic
XPathResult<Element> result = XPathFactory.instance().compile("//@*",
Filters.element()).evaluateResult(context);
if (!result.filtered().isEmpty()) {
   List<Object> filtered = result.filtered();
   System.out.println("The following results were selected by the XPath
but removed by the Filter: " + fltered.toString());
   List<Element> survived = result.result();
   System.out.println("The following results were selected by the XPath
but removed by the Filter: " + survived.toString());
}

This is all taking longer than I expected. I think I will have to put a
'proof of concept' out there, and extend the ALPHA release phase.....

Rolf

In essence this API shifts the 'onus' on ensuring the return value is of
the appropriate type to the 'user'. They know the XPath query, they should
know the return type.

>From what I can tell, this model should be compatible with any back-end,
including XPath 2.0. It does not impose any XPath-specific logic modifiers.
If you want a 'number' back from your XPath then you need to use the XPath
number() function to get one. If you want the XPath result cast as a String
using the XPath string-conversion logic, then you should wrap your XPath
query in the XPath string() function. This same logic follows through to
XPath2.0

On Fri, 20 Jan 2012 10:45:19 +0000, Michael Kay <mike at saxonica.com> wrote:
>>The only case where we can't compile XPath expressions is when we want 
> to use variables. Which defeats the whole purpose of compiling XPath!
> 
> Absolutely!
> 
>  >Or we have to use thread-local compiled XPaths. So, I think it would 
> be great to split the XPath API in two parts.
> 
> That' definitely the way to go if you're making changes to this area. If

> you're not familiar with it, do take a look at the s9api design in
Saxon:
> 
>
http://www.saxonica.com/documentation/javadoc/net/sf/saxon/s9api/XPathCompiler.html
> 
> That involves three classes:
> 
> XPathCompiler contains the static context (variable and namespace 
> declarations)
> 
> XPathExecutable is the thread-safe compiled and reusable XPath
expression
> 
> XPathEvaluator contains the dynamic context (variable values, context
item)
> 
> You can eliminate the XPathEvaluator by having a more complex evaluate()

> method on the XPathExecutable, e.g. one that supplies the variable 
> values as a Map; but this doesn't reduce the overall number of objects 
> involved, it just replaces the XPathEvaluator object with a Map object.
> 
> The other big design problem with an XPath API is the types used for 
> variable values and for the evaluation result. With the JAXP API I get 
> an enormous amount of support hassle caused by the lack of type safety 
> in the way JAXP does this. In s9api I decided, despite the complexity, 
> to introduce classes XdmValue, XdmItem, XdmAtomicValue etc to make the 
> whole thing type-safe, and I don't regret the decision. (I also have 
> XdmNode which abstracts over DOM, JDOM, XOM etc nodes.)
> 
> If you're designing a new XPath API in 2012 then I think it's essential 
> to think about how it will support XPath 2.0.
> 
> Michael Kay
> Saxonica
> 
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com