You’re into XML? Mule now supports XPath, XSLT and XQuery 3.0

Reading Time: 22 minutes

In spite of JSON’s reign as the king of API data format, XML still remains the exchange data format of choice for a number of systems. Any service exposing functionality through SOAP, and many application built years ago (or even nowadays) still depend on XML to share data – to such an extent that in April 2013 the W3C published a new spec for version 3.0 of the XPath, XSLT and XQuery standards. We decided it was time to update the platform’s support for these standards and fix a couple of things while at it.

At the moment of releasing Mule 3.5.0, we were in a situation in which:

We had only partial support for XQuery 1.0 and XSLT 2.0
Users had a very inconsistent experience when dealing with XPath:
When processing an XSLT template, we supported XPath 2.0
When using the xpath() MEL function or the Xpath: expression evaluator, we only supported XPath 1.0
The xpath() function and expression evaluator are like a box of chocolates: you never know what you’re gonna get. The return type changes depending on how many results the query finds and whether or not it’s a simple type or a node.
The jxpath-filter and jxpath-extractor-transformer elements, which are supposed to only process POJOS, falls back to an actual XPath 1.0 expression through the use of dom4j if the message payload is an XML document

So, mea culpa. This was a mess, no shame in admitting it as long as we go and fix it. And that’s why for Mule 3.6 we aimed to:

Provide state of the art, 100% compliance support for XPath 2.0, XSLT 2.0, and XQuery 1.0
Provide basic support for version 3.0 of the XML specs
Reuse the existing XSLT and XQuery elements and functions we have (xpath-filter, xslt-transformer, xquery-transformer, etc) so that they can be used regardless of the targeted version spec
Deprecate our current XPath support and provide a new, more usable and consistent solution allowing to use either XPath 2.0 and 3.0
Deprecate all JXPath support in favor of simple MEL expressions.

Before we begin, a few words on the 3.0 spec

The XML specs 3.0 are still a recommendation, not yet approved by the W3C committee. However, they’re on “last call” status, which means that they’re highly unlikely to receive any substantial changes.

About XPath 3.0

XPath 3.0 is backwards compatible with 2.0. However, it’s not fully compatible with version 1.0. Although a compatibility mode exists, it’s doesn’t cover all cases. This is one of the main reasons why although we’ll provide a new API for Xpath processing, we’ll still support the xpath() function which currently works with XPath 1.0 until Mule 4.0

What does basic support means?

Before we used the term “basic support” when referring to the 3.0 specs. By basic support we mean all features which don’t rely on:

Schema awareness
High order functions
Streaming

Improvements on XPath

As previously stated, we found that in our strive to provide the best experience possible we couldn’t leverage Mule’s existing xpath support, reasons being that we had an inconsistent and unusable mixture of Xpath 1.0 and 2.0, and that Xpath 3.0 is not backwards compatible with 1.0.

So, in the spirit of cleaning up we decided to deprecate the following components:

xpath: expression evaluator
xpath2: expression evaluator
bean: expression evaluator
jxpath filter
jxpath extractor transformer
jaxen-filter

Implicit things to take notice on:

Because XPath 3.0 is completely backwards compatible with 2.0, this function will also serve those wanting to use 2.0 expressions
This doesn’t guarantee support on Xpath 1.0 expressions. The simpler ones will work, but the ones which are not compatible will not. Since XPath 1.0 is dated all the way back to 1999, we consider it deprecated and won’t officially support it. Compatibility mode will be disabled.
Because we want this function to have predictable return types, we need to create a new xpath3() function. We considered adding a compatibility flag to the current function, but our analysis indicated that the impact was way too great for that to make sense. Therefore, a new xpath3 function was created and the existing xpath() one is deprecated

The new xpath3() function is of the following form:

Let’s take a closer view:

expression (required String)

The Xpath expression to be evaluated. Cannot be null or blank.

input (optional Object, defaults to the message payload)

The input data on which the expression is going to be evaluated. This is an optional argument, it defaults to the message payload if not provided

This function supports the following input types:

org.w3c.dom.Document
org.w3c.dom.Node
org.xml.sax.InputSource
OutputHandler
byte[]
InputStream
String
XMLStreamReader
DelayedResult

If the input if not of any of these types, then we’ll attempt to use a registered transformer to transform the input into a DOM document or Node. If no such transformer can be found, then an IllegalArgumentException is thrown.

Additionally, this function will verify if the input is a consumable type (streams, readers, etc). Because evaluating the expression over a consumable input will cause that source to be exhausted, in the cases in which the input value was the actual message payload (no matter if it was given explicitly or by default), we will update the output message payload with the result obtained from consuming the input.

Output type (optional String, defaults to ‘STRING’)

When executing an XPath expression, a developer might have very different intents. Sometimes you want to retrieve actual data, sometimes you just want to verify if a node exists. Also, the JAXP API (JSR-206) defines the standard way for a Java application to handle XML, and therefore, how to execute XPath expressions. This API accounts for the different intents a developer might have and allows choosing from a list of possible output types. We consider this to be a really useful features in JAXP, and we also consider that many Java developers that are familiar with this API would appreciate that Mule accounts for this while hiding the rest of the API’s complexity.

That is why there’s a third parameter (optional, String), which will allow specifying one of the following:

BOOLEAN: returns the effective boolean value of the expression, as a java.lang.Boolean. This is the same as wrapping the expression in a call of the XPath boolean() function.
STRING: returns the result of the expression converted to a string, as a java.lang.String. This is the same as wrapping the expression in a call of the XPath string() function.
NUMBER: returns the result of the expression converted to a double as a java.lang.Double. This is the same as wrapping the expression in a call of the XPath number() function.
NODE: returns the result the result as a node object.
NODESET: returns a DOM NodeList object. Components like the foreach, splitter, etc, will also be updated to support iterating that type.

Query Parameters

Another XPath feature that will now be supported is the ability to pass parameters into the query. For example, consider the following query which returns all the LINE elements which contains a given word:

the $ sign is used to mark the parameter. As for the binding, the function will automatically resolve that variable against the current message flow variables. So, if you want to return all the occurrences of the word ‘handkerchief’, all you have to do is:

NamespaceManager

Unlike its deprecated predecessor, the xpath3 function will be namespace-manager aware, which means that all namespaces configured through a namespace-manager component will be available during the xpath evaluation.

For example, suppose you want to do an XPath evaluation over this document:

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body foo="bar"> <ns1:echo xmlns:ns1="http://simple.component.mule.org/"> <ns1:echo>Hello!</ns1:echo> </ns1:echo> </soap:Body> </soap:Envelope>

As you can see, that document has a lot of namespaces which the XPath engine needs to be aware in order to navigate the DOM tree. You can easily configure that like this:

<mulexml:namespace-manager includeConfigNamespaces="true"> <mulexml:namespace prefix="soap" uri="http://schemas.xmlsoap.org/soap/envelope/"/> <mulexml:namespace prefix="mule" uri="http://simple.component.mule.org/"/> </mulexml:namespace-manager> <flow name="xpathWithNamespace"> <expression-transformer expression="xpath3('/soap:Envelope/soap:Body/mule:echo/mule:echo')" /> </flow>

Because we aim for consistency, this also affects the xquery-filter element, which means that some applications might have issues if they were using expressions with custom namespaces without specifying the namespace manager correctly. That can be fixed by either declaring the manager or using wildcard expressions (e.g.: use *:/title instead of book:/title).

Improvements on XQuery

We also managed to maintain the same syntax already present on the xquery-transformer element and the XQuery version is selected through a declaration on the XQuery script. If a version is not specified then it will default to 3.0, since per the spec, all 1.0 queries are valid in 3.0 and must return the same result.

However, unlike its XSLT cousin, the xquery-transformer now has some new tricks under its sleeve. But let us first take a quick peek on what new things you can do with it now.

Support for multiple inputs

Before Mule 3.6, there was no way to use the xquery-transformer to evaluate an XQuery script which operates over multiple documents at the same time. This partially because of limitations on the underlying engine, and partially because of limitations on the transformer which only made it possible to give the script parameters which were simple types (strings, numbers, etc).

Now we added support for passing DOM documents and nodes (instances of org.w3c.dom.Document or org.w3c.dom.Node). For example, consider a simple query which takes two XML files (one with cities and one with books) and mixes the title of the box with the name of the city:

<mxml:xquery-transformer> <mxml:context-property key="books" value="#[flowVars['books']]" /> <mxml:context-property key="cities" value="#[flowVars['cities']]" /> <mxml:xquery-text> <![CDATA[ xquery version "3.0"; declare variable $document external; declare variable $cities external; declare variable $books external; <mixes> { for $b in $books/BOOKLIST/BOOKS/ITEM, $c in $cities/cities/city return <mix title="{$b/TITLE/text()}" city="{$c/@name}" /> } </mixes> ]]> </mxml:xquery-text> </mxml:xquery-transformer>

The $cities and $books variables hold documents or nodes that were passed as context properties. Also, because we now support XQuery 3.0, the same can be achieved by providing the path to the actual XML document and the engine would generate the document itself:

<mxml:xquery-transformer> <mxml:context-property key="books" value="#[flowVars['books']]" /> <mxml:context-property key="cities" value="#[flowVars['cities']]" /> <mxml:xquery-text> <![CDATA[ xquery version "3.0"; declare variable $document external; declare variable $cities external; declare variable $books external; <mixes> { for $b in fn:doc($books)/BOOKLIST/BOOKS/ITEM, $c in fn:doc($cities)/cities/city return <mix title="{$b/TITLE/text()}" city="{$c/@name}" /> } </mixes> ]]> </mxml:xquery-text> </mxml:xquery-transformer>

In this case the flowVars only contain the path to the xml documents on disk and the fn:doc function inside the query takes care of the parsing.

Try..Catch blocks

You can now use try..catch blocks on your statements. This simple example shows a script which will always fail and consistently return an error tag:

<mxml:xquery-transformer> <mxml:xquery-text> <![CDATA[ xquery version "3.0"; declare variable $document external; let $x := "Hello" return try { $x cast as xs:integer } catch * { <error>Caught error {$err:code}: {$err:description}</error> } ]]> </mxml:xquery-text> </mxml:xquery-transformer>

Switch statements

Plain old switch blocks for everyone! The example below will always return <Quack />

<mxml:xquery-transformer> <mxml:xquery-text> <![CDATA[ xquery version "3.0"; declare variable $document external; let $animal := "Duck" return switch ($animal) case "Cow" return <Moo /> case "Cat" return <Meow/> case "Duck" return <Quack /> case "Dog" case "Pitbull" return <Wuff/> default return "What's that odd noise?" ]]> </mxml:xquery-text> </mxml:xquery-transformer>

Group By

Just like in XSLT, grouping is now a thing:

<mxml:xquery-transformer> <mxml:xquery-text> <![CDATA[ xquery version "3.0"; declare variable $document external; for $n in 1 to 10 group by $mod := $n mod 2 return if ($mod = 0) then <even>{$n}</even> else <odd>{$n}</odd> ]]> </mxml:xquery-text> </mxml:xquery-transformer>

The query above produces this output:

Return type improvements

This is not an improvement of XQuery itself, but something that was not good in our implementation and we took the opportunity to fix it. By default, the XQuery transformer only returned the first result, unless an array is specified in the returnClass attribute, in which case it returned all the matches in an Object[] (even if the return type was set to X[]). This means that by default, the transformer did not return all results. If the user did specify a return value, but no results were found then it returned NullPayload. If it came back with only one, then it returns that one element, even if you asked for an Array.

Although this is clearly a bug and a usability pain, fixing this could break some applications which are taking this bug as a feature. Thus:

By default, the xquery transformer will return a java List
That list will contain all the results, even if only one was found
If no results found, then the list will be empty
If the user did specified a return class then we will fallback to the old behaviour (array, one element, or null), allowing users to have a quick fallback option.

Improvements in XSLT

The xslt-transformer element we currently have remains unaltered from a behaviour and syntax standpoint. However, under the hood it now supports XSLT 3.0. Which version of XSLT will be used to evaluate the stylesheet will depend of the XSLT version declared on the XSL template. Any templates declaring version 2.0 will maintain its current version. Those declaring 3.0 will benefit from this new features. One quick example of XSLT’s new found power is that you can now use group-by expressions when iterating over a set of nodes. For example consider the following XML listing cities of the world:

<?xml version="1.0" encoding="UTF-8"?> <cities> <city name="milan" country="italy" pop="5"/> <city name="paris" country="france" pop="7"/> <city name="munich" country="germany" pop="4"/> <city name="lyon" country="france" pop="2"/> <city name="venice" country="italy" pop="1"/> </cities>

Suppose we want to convert that to an HTML table which shows the countries with all their cities comma separated and the sum of their populations. You can do that like this:

<mulexml:xslt-transformer name="xslt"> <mulexml:xslt-text> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match="/"> <table> <xsl:for-each-group select="cities/city" group-by="@country"> <tr> <td> <xsl:value-of select="@country"/> </td> <td> <xsl:value-of select="current-group()/@name" separator=", "/> </td> <td> <xsl:value-of select="sum(current-group()/@pop)"/> </td> </tr> </xsl:for-each-group> </table> </xsl:template> </xsl:stylesheet> </mulexml:xslt-text> </mulexml:xslt-transformer>

The output would be something like:

<table> <tr> <th>Country</th> <th>City List</th> <th>Population</th> </tr> <tr> <td>italy</td> <td>milan, venice</td> <td>6</td> </tr> <tr> <td>france</td> <td>paris, lyon</td> <td>9</td> </tr> <tr> <td>germany</td> <td>munich</td> <td>4</td> </tr> </table>

<The End/>

Well, this is the end of this post. It sounds cliche but I really hope you like these improvements. When I first started working with Mule I wasn’t a code contributor but just a user, and these issues with the XML support really used to bother me. So for me personally, it’s really great to finally have fixed this. I hope you get to enjoy just as much and remember that feedback is always welcome!

Thanks for reading!