HowTo – Extract, Transform, Load (ETL) and Change Data Capture

Data storage flat icon

We recently introduced our HowTo blog series, which is designed to present simple use-case tutorials to help you as you evaluate Anypoint Platform. The goal of this blog post is to give you a short introduction on how to implement a simple ETL (Extract, Transform, and Load) scenario using Mulesoft’s batch processing module.

Anypoint Platform brings together leading application integration technology with powerful data integration capabilities for implementing such a use case.

Custom batch job instance IDs in Mule 3.8

Welcome to the final post in the three post series about batch improvements on Mule 3.8!

The last new feature we have is a simple one which comes quite handy when you need to read through logs. As you know, batch jobs are just programs processed in batch mode, and each time the job is triggered, a new job instance is created and tracked separately. Each of those instances is unique and therefore has a unique ID.

Configurable batch block size in Mule 3.8

Welcome back! Following the series about new batch features in Mule 3.8, the second most common request was being able to configure the batch block size.

What’s the block size?

largelargeIn a traditional online processing model, each request is usually mapped to a worker thread. Regardless of the processing being synchronous, asynchronous, one-way, request-response, or even if the requests are temporarily buffered before being processed (like in the Disruptor or SEDA models),

Batch improvements in Mule 3.8 – Mutable commit blocks

motif

Hello there! If you’ve been using Mule for a while now, you probably remember that the batch module was introduced back in the 3.5 release. If you’re not familiar with it, you can familiarize yourself by following these links:

We received a lot of love for this feature,

Batch Module: Obtaining the job instance id in Mule 3.7

A popular request among users of the Batch module is the ability to grab the job instance id in any of a Batch job’s phases. Why is that useful? Well, there could be a number of useful scenarios:

Mr. Batch and the Quest for the right Threading Profile

Sometimes (more often than we think), less concurrency is actually more. Not too long ago, I found myself in a conversation in which we were discussing non-blocking architectures, tuning, and performance. We were discussing that tuning for those models often starts with “2 threads per core” (2TPC). The discussion made me curious about how Mule’s batch module would perform if tested by 2TPC. I knew beforehand that 2TPC wouldn’t be so impressive on batch, mainly because it doesn’t use a non-blocking threading model.

Fast and Slow through the Air

motif

Handling endpoints with disparate speed when the platform is in the cloud

A fairly common integration requirement is to accumulate data coming in real-time or near real-time, hold and consolidate the records, then send the transformed messages to another system on a fixed schedule (e.g. daily etc.) for business reasons, especially if the endpoints are legacy systems. For on-premises integration platforms, this use case is rather straightforward to implement.

Handle Errors in your Batch Job… Like a Champ!

Fact: Batch Jobs are tricky to handle when exceptions raise. The problem is the huge amounts of data that these jobs are designed to take. If you’re processing 1 million records you simply can’t log everything. Logs would become huge and unreadable. Not to mention the performance toll it would take. On the other hand, if you log too little then it’s impossible to know what went wrong, and if 30 thousand records failed, not knowing what’s wrong with them can be a royal pain.

Near Real Time Sync with Batch

motif

The idea of this blog post is to give you a short introduction on how to do Real time sync with Mule ESB. We’ll use several of the newest features that Mule has to offer – like the improved Poll component with watermarking and the Batch Module. Finally we’ll use one of our Anypoint Templates as an example application to illustrate the concepts.

What is it?

Near Real time sync is the term we’ll use along this blog post to refer to the following scenario:

“When you want to keep data flowing constantly from one system to another”

Batch Module Reloaded

With Mule’s December 2013 release we introduced the new batch module. We received great feedback about it and we even have some CloudHub users happily using it in production! However, we know that the journey of Batch has just begun and for the Early Access release of Mule 3.5 we added a bunch of improvements. Let’s have a look!

Support for not Serializable Objects

A limitation in the first release of batch was that all records needed to have a Serializable payload.