Thread management and auto-tuning in Mule 4

thread manage

My first two posts in this Mule 4 blog series were on scaling your APIs and reactive programming in our newest version of Mule runtime engine. This blog dives into thread management and auto-tuning.

Mule 4 eradicates the need for manual thread pool configuration as this is done automatically by the Mule runtime.

Centralized thread pools

Thread pools are no longer configurable at the level of a Mule application. We now have three centralized pools:


All three are managed by the Mule runtime and shared across all applications deployed to that runtime. A running Mule application will pull threads from each of those pools as events pass through its processors. The consequence of this is that a single flow may run in multiple threads. Mule 4 optimizes the execution of a flow to avoid unnecessary thread switches.

HTTP thread pools

The Mule 4 HTTP module uses Grizzly under the covers. Grizzly needs selector thread pools configured. Java NIO has the concept of selector threads. These check the state of NIO channels and create and dispatch events when they arrive. The HTTP Listener selectors poll for request events only. The HTTP Requester selectors poll for response events only.

There is a special thread pool for the HTTP Listener. This is configured at the Mule runtime level and shared by all applications deployed to that runtime. There is also a special thread pool for the HTTP Requester. This is dedicated to the application that uses an HTTP Requester. So 2 applications on the one runtime both using an HTTP Requester will have one selector pool each for that HTTP Requester. If they both use an HTTP Listener they will share the one pool for the HTTP Listener.

Thread pool responsibilities

The source of the flow and each event processor must execute in a thread that is taken from one of the three centralized thread pools (with the exception of the selector threads needed by HTTP Listener and Requester). The task accomplished by an event processor is either 100% nonblocking, partially blocking, or mostly blocking.

Thread pool responsibilities

The CPU_LITE pool is for tasks that are 100% non-blocking and typically take less than 10ms to complete. The CPU_INTENSIVE pool is for tasks that typically take more than 10ms and are potentially blocking less than 20% of the clock time. The BLOCKING_IO pool is for tasks that are blocked most of the time.

Thread pool sizing

The minimum size of the three thread pools is determined when the Mule runtime starts up.

Thread pool sizing

The minimum size of the shared Grizzly pool for the HTTP Listener is determined upon the deployment of the first app to the Mule runtime that uses an HTTP Listener. The size of the dedicated Grizzly for the HTTP Requester pool is determined upon deployment of each app that uses an HTTP Requester.

In all cases, the minimum number of threads equals the number of CPU cores available to the Mule runtime. Growth towards the maximum is realized in increments of one thread as needed.

The maximum size of the BLOCKING_IO thread pool is calculated based on the amount of usable memory made available to the Mule runtime. This is determined by a call the Mule runtime makes when it boots to Runtime.maxMemory().

For a Mule runtime sitting on a 2 core / 1 Gig machine or container, the following table shows what the minimum and maximum values are for each thread pool.

minimum_maximum thread pool values

Thread pool scheduler assignment criteria

In Java, the responsibility of managing thread pools falls on the Scheduler. It pulls threads from the pool and returns them and adds new threads to the pool. Each of the five pools we described above has its own Scheduler. When a Mule 4 app is deployed each of its event processors is assigned a Scheduler following the criteria outlined in the following table.

scheduler_event processors table

An important consideration is the handoff between each event processor. That is always executed on a CPU_LITE thread.

Over time our engineers will enhance modules to make their operations non-blocking.

Mule runtime example consumption of thread pools

The following diagrams show how threads are assigned in various types of Mule flow. Watch out for the red traffic light, which denotes a blocking operation (BLOCKING_IO). The amber traffic light denotes potential partial blocking (CPU_INTENSIVE). The space or handoff between each event processor is non-blocking and catered to by a CPU_LITE thread. Nevertheless, the optimization in the thread schedulers will avoid unnecessary thread switching so a thread from a given pool can continue to execute across processors as we shall see.

Typical thread switching scenario

Typical thread switching scenario

In this first scenario:

  • SHARED_GRIZZLY Thread #10 receives the HTTP Listener request.
  • CPU_LITE Thread #8 caters to the handoff between the HTTP Listener and the Database select operation.
  • BLOCKING_IO Thread #5 must make the call to the database server and then wait for the result set to be sent back.
  • A thread from CPU_LITE is needed for the Logger operation but Scheduler optimization allows for CPU_LITE Thread #2 to also be used for the handoff before and after it.
  • CPU_INTENSIVE Thread #16 executes the DataWeave transformation. DataWeave always takes a thread from this pool regardless of whether blocking actually occurs.
  • A similar optimization occurs on the second Logger and handoffs with CPU_LITE Thread #1 also making the outbound HTTP Requester call.
  • DEDICATED GRIZZLY Thread #2 receives the HTTP Requester response.
  • There is an optimization on the response after flow completion: CPU_LITE Thread #7 does the handoff back to the flow source (HTTP Listener) and also executes the response to the client.

Try scope with Transaction

Try scope with Transaction

Here the Transactional Try scope mandates the use of a single thread. This will always be from the BLOCKING_IO pool regardless of what type of operations are contained within the scope.

JMS Transactional

JMS transactional

In this scenario the whole flow is transactional and requires a single thread from BLOCKING_IO up to the HTTP request.

JMS transactional with Async scope

JMS transactional with Async scope

In this scenario, the Async scope ends the transaction and normal thread selection applies.

My next blog will dive into input streams in Mule 4. You can try Mule 4 today to see how you can address vertical scalability in an effective way or read our whitepaper, Reactive programming: New foundations for high scalability in Mule 4.

We'd love to hear your opinion on this post

11 Responses to “Thread management and auto-tuning in Mule 4”

  1. An excellent article !!!

  2. Can you please add the scenarios for VM Connectors? and Batch processing?

  3. It looks like the formula that is mentioned need a correction:
    #cores + mem-24760)/5120
    Can you please look into? Thanks.

  4. Few of the terms are not clear e.g.
    1) “All three are managed by the Mule runtime and shared across all applications deployed to that runtime. ” In cloud hub deployment every API has its own container and every container has its own Mule Runtime Server. So how is it possible to say that “thread pool is shared across all applications”.
    2) How the thread pool size is defined based on “Mule Runtime Server”? Is it some fix value of decided based on Mule Runtime Server?

  5. Hi Abhishek,
    You are right in pointing out that Cloudhub deployments limit the number of applications deployed to a Mule runtime to one. But that does not change the way thread pools are configured. In cloudhub the one and only deployment has access to the pools for the runtime. In on-premise deployments where there may be more than one deployment per runtime, they will all have access to the same 3 pools. The point is that pools are assigned to the runtime and not to an individual app.
    Regarding sizing criteria for each pool: for CPU_LITE, CPU_INTENSIVE and they Grizzly pools, the size of each is a function of the number of cores on top of which the runtime is deployed. For BLOCKING_IO the calculation is made based on the amount of memory made available to the JVM using the formula max(2, cores + ((mem – 245760) / 5120)) where “mem” is counted in kbytes.

  6. Thanks Ravi,

    You’re absolutely right. It should be max(2, cores + ((mem – 245760) / 5120)) where “mem” is in kybtes made available to the JVM

  7. Hi Nial,

    Thanks for the great article. One thing that stuck out to me was that DataWeave is always assigned to CPU_INTENSIVE tracks. I was suddenly like, oh no, I use DW transformers to do all sorts of things, like setting a bunch of variables or sending response messages as payload. Will using DW transformers in this way have a non-trivial impact on the performance of an application, or should I not worry about it?


  8. The blog was very clear on Theard Management and auto-tuning. Example scenarios give more clarity and clear all pending thoughts.

  9. A great article, it helped me a lot to understand how threads are managed in mule runtime. The number of CPU INTENSIVE threads are quite low and I noticed that along with DW script the execution engine (with Groovy) also uses the CPU INTENSIVE thread pool. Is there any way to tell mule to execute the execution engine or preferably a particular connector to use a specific thread pool. For example can I tell mule that this connector should be executed by one of the threads from BLOCKING pool and not from the CPU INTENSIVE pool.
    I am asking for cloudhub deployed application.

  10. Hi,

    Great Article. Can you please explain how does it work for transform component if we are setting payload,attributes and var all 3 in one.