A common integration scenario is where a single message needs to be sent through multiple routes.
Take for example a case in which you’re receiving a message about a new client’s on-boarding. The message needs to be routed through the CRM to create the client, to marketing who will want to know how the client heard about the company, and finally passed to provisioning and stock systems so they can work their magic as well.
In this case, the message is broadcasted in a “fire and forget” fashion, meaning you don’t need a response from any of these systems to continue your processing. Each of those systems are responsible for handling their own logic and their own errors. In Mule ESB, you could do this like this:
There are other cases, however, in which you do need the response from the routes. Suppose you’re using a travel booking application and somebody wants a direct flight from Buenos Aires to San Francisco. Your app needs to contact all known airline brokers, get availability for those flights and choose the cheapest one. The <async> scope is insufficient for you in that case because you want the thread processing the request to actually wait for the responses to arrive. Sounds like a job for a multicasting router!
When <all> is just not enough
Mule ESB already has a multicasting router called <all>, which channels a message through several routes and then continues processing after all routes responded. Although it works, this router has a series of limitations:
- It uses serial processing! It means that all routes are executed in order, one after the other in one single thread. This means that the total amount of time that we have to wait until we can get our hands on all the responses is the sum of all routes’ executing times.
- It doesn’t do a very good job in error handling: Suppose you have 4 routes like in the example above. If the second route fails, routes 3 and 4 will never be executed. On top of that, you only get information on route number 2 failing, no information on route 1 is available.
- It’s not very customisable. When successful, the multicasting router always returns a MuleMessageCollection. So, going back to the travel booking example, you can’t have a customised <all> router that simply returns the cheapest flight instead of the whole MuleMessageCollection.
To overcome these limitation, Mule 3.5 Early Access release has a new kid in the block: The <scatter-gather> router.
Introducing the Scatter-Gather
I’d love to be able to describe this in a simple and more concise way, but I can’t, so let’s just do a quick list of differences with the <all> router and jump right to an example:
- Parallel processing: Scatter-Gather uses a thread pool to concurrently execute all routes. This means that the total time the caller thread needs to be waiting for routes to respond is no longer the sum of all route’s time, but just the longest of them.
- Better Error Handling: Because all routes execute in parallel, one (or many) failing routes do not prevent other routes from being executed. Also, in case of exception, you will get a CompositeRoutingException, which not only contains information of all failed routes, but also the responses from the successful ones.
- Configurability: As the EIP definition says, there’s an aggregator used to combine the responses. By default, Mule’s implementation of scatter-gather will return a MuleMessageCollection so that it’s consistent with ye olde <all> router, making it easier for existing users to migrate and take advantage of the improved performance. However, you can replace this with your own aggregation strategy, but will get to that soon enough…
Scattering in Action!
To make this example simple and illustrate the performance improvement that comes from using <scatter-gather> over <all> I’ll take an old example I did over a year ago in this post. In this example (which basically explores the Google Connectors Suite and DataMapper), we take a Google Spreadsheet with super heroes contact information and we use it to create Salesforce Accounts, Google Contacts, Google Calendar appointments and Google tasks. Details of that example are available in the original post, so I’ll just focus in the part of it that broadcasts the original message (each superhero found on the spreadsheet) into several routes (salesforce and google apps). At an XML level, it finally comes to this:
When executing this in my local PC, it takes 9 seconds to complete the whole integration (this number might vary depending on your geographical location, bandwidth, and laptop processing power). Now, let’s see the same example with the new router:
In this case, time was reduced to only 5 seconds: 45% faster.
Let’s go back for a second to the cheapest flight example. Suppose that once all routes responded you want to filter the ones that had errors (if any) and then choose the cheapest one. You could do that in Mule using flows, but consider this simpler solution:
That’s cool! I can easily customise how the response events are aggregated without worrying about the aggregation complexity itself! But how do I use that class? Check it out:
- The new scatter-gather router will provide multicasting functionality in a way more performant fashion than the <all> router does.
- It’s also more customizable and provides better error handling.
- You should prefer <scatter-gather> over <all> in most scenarios, exception being those in which failure in one route should stop the followings from executing. The <all> router is deprecated since Mule 3.5.0
- Scatter-Gather is available since Mule’s 3.5 Early Access release
Thank you for reading and looking forward for your feedback!