The batch module was first introduced in Mule 3.5 and it aims to simplify integration use cases where basic ETL functionality is needed. If you’re unfamiliar with it, I recommend you look at this post before continuing. If you are already old friends with batch, then keep reading for a breakdown of all the changes!
We recently introduced our HowTo blog series, which is designed to present simple use-case tutorials to help you as you evaluate Anypoint Platform. The goal of this blog post is to give you a short introduction on how to implement a simple ETL (Extract, Transform, and Load) scenario using Mulesoft’s batch processing module.
Sometimes (more often than we think), less concurrency is actually more. Not too long ago, I found myself in a conversation in which we were discussing non-blocking architectures, tuning, and performance. We were discussing that tuning for those models often starts with “2 threads per core” (2TPC). The discussion made me curious about how Mule’s batch module would perform if tested by 2TPC. I knew beforehand that 2TPC wouldn’t be so impressive on batch, mainly because it doesn’t use a non-blocking threading model.
Fact: Batch Jobs are tricky to handle when exceptions raise. The problem is the huge amounts of data that these jobs are designed to take. If you’re processing 1 million records you simply can’t log everything. Logs would become huge and unreadable. Not to mention the performance toll it would take. On the other hand, if you log too little then it’s impossible to know what went wrong, and if 30 thousand records failed, not knowing what’s wrong with them can be a royal pain.
The idea of this blog post is to give you a short introduction on how to do Real time sync with Mule ESB. We’ll use several of the newest features that Mule has to offer – like the improved Poll component with watermarking and the Batch Module. Finally we’ll use one of our Anypoint Templates as an example application to illustrate the concepts.
What is it?
Near Real time sync is the term we’ll use along this blog post to refer to the following scenario:
“When you want to keep data flowing constantly from one system to another”
MuleSoft provides the most widely used integration platform for connecting any application, data source or API, whether in the cloud or on-premises. With Anypoint Platform®, MuleSoft delivers a complete integration experience built on proven open source technology, eliminating the pain and cost of point-to-point integration. Anypoint Platform includes CloudHub™ iPaaS, Mule ESB™, and a unified solution for API management™, design and publishing.