Total traceability with correlation IDs

December 6 2011

1 comment. 0
motif

Distributed systems are great: they’re more versatile and resilient than monolithic ones. They also bring challenges of their own, one of them being the difficulty of building a holistic picture of the systems and interactions involved in the processing of a request or the execution of a business activity.

Business process modeling and their reification in business process engines can help a lot in this matter. But these engines are not pervasively used and there are still blind spots, like in network interactions, that need to be addressed.

In this blog, we’ll look at the usage of correlation identifiers as the means to keeping track of what’s happening in a distributed topology. And of course, we’ll also look at how Mule can help you keep an eye on your messages!


The idea

One commonly used solution is to consistently carry a unique message identifier, called a correlation ID, alongside (or within) every message that transits between the different systems. By this correlation ID at critical points and syndicating all the logs in a central location (for example by using syslog), it becomes possible to build a near-realtime picture of what’s happening and where (for example by using Clarity).

Logging should indeed happen in all the systems involved in processing messages. Servers and middleware are obvious places where logging is done, but clients (ie. service consumers) should also log the interactions they perform to ensure total traceability.

Now, how does Mule handle message IDs? If you look in the API documentation, you’ll see that Mule carries two identifiers per message:

  • A correlation ID, which is only available if it has been set by the transport on inbound messages or explicitly by means of configuration.
  • A unique ID, which is always available and often used as a fallback by Mule internal components if no correlation ID is available.

In most circumstances, Mule doesn’t rely on a message having a correlation ID. This is why it can be empty. This said, having this ID becomes essential when performing message splitting and aggregation. This is when the aforementioned fallback to the message unique ID would occur. When a correlation ID has been set, Mule will preserve it across transport inasmuch as it is possible to do it so transparently. This means that Mule will propagate the correlation ID alongside the message payload with transports that support metadata (like HTTP or  JMS). Otherwise, it is up to the user to ensure this ID gets propagated (usually by storing it in the message payload).

Enough talking! Let’s look at some examples that illustrate this discussion.

Server generated correlation IDs

In the following example, messages are received over HTTP. The HTTP transport itself doesn’t set any correlation ID on the inbound message, so we assign the unique ID created by Mule to the message correlation ID.

Notice how we used a logger message processor to log the critical information: who received the message and what the message correlation ID is. This is what we see in Mule’s log:

So how does a caller of this flow receive the correlation ID generated by the server? Easy: Mule returns this value in a header named: X-MULE_CORRELATION_ID.

Server-side correlation ID generation suffers from two problems though:

  • One-way protocols (like JMS) have no direct means of returning the correlation ID back to the client. Another channel could be used but that complicates things.
  • Unexpected issues can prevent the client from receiving the generated ID even though the request has been accepted and processed on the server.

This is why client ID generation should be considered.

Client generated correlation IDs

Clients can use a unique ID standard like UUID to generate a new ID. In some environments, a centralized in-house unique ID generator must be contacted for that matter. Whatever the generator is, Mule can accommodate user generated correlation IDs. Let’s see how.

First we will refactor the above HTTP example to accept an ID sent by the client:

The only notable difference in this new flow is that we extract the correlation ID value from a HTTP header named X-CID, which the client is expected to provide. Otherwise, the logger statement remains exactly the same. In fact, we should probably move this logger message processor in a sub-flow in order to share it easily across all flows.

We can do the same over JMS:

Notice how this time the little Groovy script is gone. This is because JMS accommodates a correlation ID header field that Mule recognizes and automatically assigns to the received message. Client side, here is how we generate an ID using Java’s UUID class:

So far we’ve only considered transport that supports some form of meta information (ie. headers). When meta information can’t be carried alongside a payload, the only option is to carry the generated correlation ID in the payload itself. The following demonstrates a TCP endpoint that expects to receive a JSON object that contains a CID attribute and uses it as the message correlation ID:

Nothing complex really, thanks to the json-to-object-transformer, which does the work of making the JSON payload easily navigable.

What about outbound messages?

Outbound services that are outside of your control can’t be included directly in this traceability strategy because you simply don’t have access to their log files.

In that case, you should log a message after a successful interaction with them, as demonstrated here:

Anything for business?

Users of the Enterprise Edition of Mule can turn to the Management Console to get a detailed view of the path followed by a message while it is processed. Indeed, the Management Console sports a Business Events tab, shown below, that offers a customizable and searchable interface for message flows in Mule known as the Business Event Analyzer.

The Business Event Analyzer can be used to provide a non-technical view of message flows to business analysts, who would otherwise find raw log files too off-putting to use effectively. Operation teams can also benefit from this tool because it offers a convenient way of performing root cause analysis by drilling down into recorded event flows.

Backed by your database of choice, the Business Event Analyzer is designed to let you control from your Mule configuration the information you deem relevant for each type of message. This means that you can display a correlation ID and, alongside with it, other more meaningful information extracted from the message payload (for example: a transaction ID relevant to business, timestamps, statuses…).

You can learn more about this tool in this recorded webinar.

Parting words…

As you’ve seen in this blog, correlation ID and logging are low hanging fruits that can help you achieve total traceability in a distributed systems environment. Mule provides standard and advanced tooling to help you with this task.

Do you use any other strategies? Please share them!


We'd love to hear your opinion on this post

One Response to “Total traceability with correlation IDs”

  1. Agree(0)Disagree(0)Comment