Improved threading model in Mule 4.3

Reading Time: 19 minutes

When first released, Mule 4 introduced a brand new reactive execution engine and threading models. Benefits include high scalability, back pressure capabilities, and auto-tuning.

As part of Mule 4.3, we reiterated on these and made further improvements. More specifically, the three thread pools were combined into a single thread pool. Before jumping into the details, let’s quickly summarize how the threading models were set up prior to this release.

In Mule 4.1 and 4.2, all work was executed as tasks scheduled in one of three thread pools:

CPU_LIGHT

For tasks that take up to 10ms to execute.
No blocking IO operations should be executed here.
Non-blocking IO operations are executed here, since they should quickly delegate into an operating system thread which will perform the blocking part.
Default size is 2 * cores.

CPU_INTENSIVE

For tasks that take more than 10ms to execute (duration is not enforced, but misclassifying tasks has bad consequences).
Typically for transformations, encrypt/decrypt, heavy computation, etc.
Non-blocking IO operations should be executed here.
Default size is 2 * cores.

IO

All blocking IO operations should happen here.
Significantly larger than the other pools, as most threads here are expected to be in a blocked state.
Default size comes from a formula that considers the available memory, the default size of the streaming buffers and other concepts.

The proactor pattern

Proactor is a design pattern for asynchronous execution described here.

This means that all tasks are classified in categories which correspond to each of our thread pools, and they will be submitted for execution there. So for example:

<flow>
(1) <sftp:read path="personsArray.json" />(2) <http:request path="/persons" method="POST" />
(3) <set-variable variableName="firstEntryName" value="#[payload[0].name]" />
(4) <ee:transform ... />
(5) <logger message="#[vars.firstEntryName]" />
</flow>

The flow above pulls down a JSON array of Person objects, described in JSONformat. It pushes the content through an HTTP request, picks the name of the first entry and does some processing.

According to the proactor pattern, this is how the tasks will be submitted:

Blocking operation, executes on the IO pool.
Http:request is a non blocking operation. The request will be performed on the current. thread, when the response is received, it will switch to CPU_LIGHT pool.
Set-variable operations should be fairly quick. Stays in CPU_LIGHT. No thread switch.
<ee:transform>, this is potentially a computational heavy transformation, so switch to the CPU_INTENSIVE pool.
Logger. Stay on CPU_INTENSIVE. No thread switch. (*)

(*) Due to optimizations regarding latency, thread switches are omitted when an IO or CPU_INTENSIVE task is followed by a CPU_LIGHT one. Reasoning behind this optimization is that executing said CPU_LIGHT task is most likely cheaper than the thread switch.

What’s changed?

Performance is a major topic of the Mule 4.3 release. As part of that, we improved the above model by unifying these three thread pools into a single thread pool. This helps us improve the Mule runtime’s auto-tuning feature and make better use of available resources.

We refer to this new unified pool as THE UBER POOL.

What about proactor?

The most pertinent question would be: If all threads now come from the same pool, then why keep applying the proactor pattern?

That was our initial thought as well and in the first iteration we stopped applying the proactor pattern.

However, as we made progress in the performance front, we quickly realized that even with the unified pools the proactor pattern was still better for performance as it allows threads to go back into the main loop and keep accepting events from the event sources.

Therefore, the proactor pattern is still applied in the exact same way it was in Mule 4.1 and 4.2. Put in simple terms, whenever there was a thread switch before, there’s still a thread switch in Mule 4.3, the only thing that changes is the target pool.

Backwards compatibility

The solution is completely backwards compatible as this has no impact in the application’s behavior.

In the event of unforeseen corner cases, or if you have done fine tuning customization you wish to preserve, Mule runtime engine can always be configured to go back to the previous threading model.

Configuration

Configuration is still done through the scheduler-pools.conf file, which now looks like this.

Let’s walk through the main differences:

# The strategy to be used for managing the thread pools that back the 3 types of schedulers in the Mule Runtime
# (cpu_light, cpu_intensive and I/O).
# Possible values are:
#    - UBER: All three scheduler types will be backed by one uber thread pool (default since 4.3.0)
#    - DEDICATED: Each scheduler type is backed by its own Thread pool (legacy mode to Mule 4.1.x and 4.2.x)
org.mule.runtime.scheduler.SchedulerPoolStrategy=UBER

This new parameter “SchedulerPoolStrategy” allows switching between the UBER (unified) scheduling strategy vs. the legacy DEDICATED (separated) pools strategy.

When this parameter is set to UBER, the following applies:

# The number of threads to keep in the uber pool.
# Supports Expressions
# Only applies when org.mule.runtime.scheduler.threadPool.strategy=UBER
org.mule.runtime.scheduler.uber.threadPool.coreSize=cores

# The maximum number of threads to allow in the uber pool.
# Supports Expressions
# Only applies when org.mule.runtime.scheduler.threadPool.strategy=UBER
org.mule.runtime.scheduler.uber.threadPool.maxSize=max(2, cores + ((mem - 245760) / 5120))

# The size of the queue to use for holding tasks in the uber pool before they are executed.
# Supports Expressions
# Only applies when org.mule.runtime.scheduler.threadPool.strategy=UBER
org.mule.runtime.scheduler.uber.workQueue.size=0

# When the number of threads in the uber pool is greater than SchedulerService.io.coreThreadPoolSize, this is the maximum
# Only applies when org.mule.runtime.scheduler.threadPool.strategy=UBER
# time (in milliseconds) that excess idle threads will wait for new tasks before terminating.
org.mule.runtime.scheduler.uber.threadPool.threadKeepAlive=30000

If the DEDICATED strategy is used instead, the above parameters are ignored and the below should be uncommented:

# The number of threads to keep in the cpu_lite pool, even if they are idle.
# Supports Expressions
# Only applies when org.mule.runtime.scheduler.threadPool.strategy=DEDICATED
#org.mule.runtime.scheduler.cpuLight.threadPool.size=2*cores

# The size of the queue to use for holding cpu_lite tasks before they are executed.
# Supports Expressions
# Only applies when org.mule.runtime.scheduler.threadPool.strategy=DEDICATED
#org.mule.runtime.scheduler.cpuLight.workQueue.size=0

# The number of threads to keep in the I/O pool.
# Supports Expressions
# Only applies when org.mule.runtime.scheduler.threadPool.strategy=DEDICATED
#org.mule.runtime.scheduler.io.threadPool.coreSize=cores

# The maximum number of threads to allow in the I/O pool.
# Supports Expressions
# Only applies when org.mule.runtime.scheduler.threadPool.strategy=DEDICATED
#org.mule.runtime.scheduler.io.threadPool.maxSize=max(2, cores + ((mem - 245760) / 5120))

# The size of the queue to use for holding I/O tasks before they are executed.
# Supports Expressions
# Only applies when org.mule.runtime.scheduler.threadPool.strategy=DEDICATED
#org.mule.runtime.scheduler.io.workQueue.size=0

# When the number of threads in the I/O pool is greater than SchedulerService.io.coreThreadPoolSize, this is the maximum
# time (in milliseconds) that excess idle threads will wait for new tasks before terminating.
# Only applies when org.mule.runtime.scheduler.threadPool.strategy=DEDICATED
#org.mule.runtime.scheduler.io.threadPool.threadKeepAlive=30000

# The number of threads to keep in the cpu_intensive pool, even if they are idle.
# Supports Expressions
# Only applies when org.mule.runtime.scheduler.threadPool.strategy=DEDICATED
#org.mule.runtime.scheduler.cpuIntensive.threadPool.size=2*cores

# The size of the queue to use for holding cpu_intensive tasks before they are executed.
# Supports Expressions
# Only applies when org.mule.runtime.scheduler.threadPool.strategy=DEDICATED
#org.mule.runtime.scheduler.cpuIntensive.workQueue.size=2*cores

Configuration at the application level

Threading can also be configured at the application level. So for example, the snippet below shows how to configure a single application to use the DEDICATED threading model with custom sizing:

<ee:scheduler-pools gracefulShutdownTimeout="15000">
  <ee:cpu-light
          poolSize="2"
          queueSize="1024"/>
  <ee:io
          corePoolSize="1"
          maxPoolSize="2"
          queueSize="0"
          keepAlive="30000"/>
  <ee:cpu-intensive
          poolSize="4"
          queueSize="2048"/>
</ee:scheduler-pools>

Warning: Using this configuration spawns a completely new set of thread pools for this app. This does not change the settings of the default ones that were configured in scheduler-conf.properties file. This is especially important for on-prem scenarios in which many apps can be deployed in the same Mule runtime.

The same can be used to customize the settings of an UBER pool:

<ee:scheduler-pools poolStrategy=”UBER” gracefulShutdownTimeout=”15000″>
<ee:uber
corePoolSize=”1″
maxPoolSize=”9″
queueSize=”5″
keepAlive=”5″/>
</ee:scheduler-pools>

Again, what this does is to spawn a new and specific thread pool for this application. This does not override the pool described in the scheduler-pools.conf file.

IMPORTANT NOTE:

Recommendation is to alwaysrun Mule using the default UBER strategy. Users upgrading from Mule 4.1.x or Mule 4.2.x still get the DEDICATED strategy so that any customizations done to this file can still be leveraged. However we strongly advise to try the new default setting first and determine whether those optimizations are still required, or first try to customize the new UBER strategy before falling back to the legacy mode. Same applies to custom UBER configurations.

In either case, recommendation is to always consult with support before going into production with customized settings.

Migration guide

Users upgrading to Mule 4.3+ should take the following actions:

If no custom threading settings have been applied (either through scheduler-pools.conf file or directly in the app), then you’re good to go.
If any custom threading configurations have been used, then retest with the default configuration. It’s highly possible that the combination of this feature and other big performance improvements make it no longer necessary to have custom settings.
If tests show that custom configuration is still necessary, then reach out to our Support team. Our best practice is to only use a custom configuration after our support team has validated the root cause of the problem. Failing to follow this best practice can lead to inefficient use of resources and “masking” underlying issues.

FAQ:

Does this mean that the self-tuning feature is being dropped or fundamentally changed?

Not at all. The self tuning and threading model are two completely different features. Self tuning is about Mule automatically configuring itself to get the most out of the available resources, regardless of how those resources are actually used. The self tuning feature now distinguishes between the two threading strategies and knows how to maximize each.

What impact should we expect as users?

None. This should be fairly transparent. The only perceivable difference should be that:

Pool exhaustion errors should not appear anymore.
Thread names in the logs have changed

How does this affect performance?

Depends on the use case. The goal here is to improve resource usage. Whether that translates directly into a performance improvement depends on the application’s demand for those resources.

In some use cases, it can actually lead to lower memory footprint and better performance. But reality is that Mule 4.3 contains so many of those improvements, that it wouldn’t be accurate to attribute overall improvements to this particular feature.

I’m ingesting logs into Splunk (or similar tool) and doing log analysis. What’s the recommendation?

First, check if this affects that at all. In most cases, the logs will be analyzed looking for significant events rather than the name of the thread in which those happened. If that’s actually a factor, you can:

Adjust your log analyzer.
Change the log4j2.xml configuration to adjust the logging pattern

Does this mean that SDK users no longer need to pay attention to execution type?

Not at all. Execution type is still very important when building a custom module:

We’re still applying the Proactor pattern.
The user can still configure the Runtime to go back into DEDICATED scheduling mode
Your modules should ideally have the broadest compatibility range possible, so usage in 4.1 and 4.2 must still be accounted for.

Development with the Mule SDK does not change at all because of this feature.

For more information on the latest release of Anypoint Platform check out our announcement, or start your free trial of Anypoint Platform now.

Improved threading model in Mule 4.3

Share post

CPU_LIGHT

CPU_INTENSIVE

IO

The proactor pattern

What’s changed?

What about proactor?

Backwards compatibility

Configuration

Configuration at the application level

Migration guide

FAQ:

Does this mean that the self-tuning feature is being dropped or fundamentally changed?

What impact should we expect as users?

How does this affect performance?

I’m ingesting logs into Splunk (or similar tool) and doing log analysis. What’s the recommendation?

Does this mean that SDK users no longer need to pay attention to execution type?

Tags

Related articles

Introducing Anypoint MQ Cross-Region Failover

Anypoint Flex Gateway Policy Development Kit

Scale design and discovery of event-driven APIs with the new AsyncAPI

Newsletter

You have been redirected