A common integration practice is to persist data in long term storage options such as databases and files. However, there are use cases where long term storage is not desired. In fact, keeping it long term is harmful and the stored data should be discarded or overwritten once its purpose has been achieved.
Temporary data storage options within the MuleSoft platform
Examples of use cases for temporary data storage include:
- Last runtime of a job: There should be only one last run time to derive records of interest since the last run time
- Asynchronous processing: Temporarily store data to be processed asynchronously by a consuming application or flow
We will describe the various options available on the MuleSoft platform for temporary storage, their usage, and common design patterns and anti-patterns.
1. Object store
Object Store provides a one-stop solution for storing data and states across various Mule applications irrespective of their deployment topology. In other words, Object Store can be used to store, retrieve and update values from CloudHub applications as well as on-prem applications. Object Store is available for both CloudHub as well as on-prem applications. However, on-prem applications can access the Object Store only via the REST API endpoint.
Usage patterns
- Watermarking: Watermarking is a technique to mark a point by using data attributes like row id, timestamp, sequence id etc. The watermark is used to derive the delta for each subsequent call of application. On successful execution, the application updates the watermark value, which will be used in the next execution. This process gets repeated again every time the application is executed. Object store is used to store, retrieve, and update.
- Non-idempotent processing: Non-idempotent processing refers to a scenario where a transaction should NOT be processed multiple times. In this scenario, once an application picks a message for processing, it stores the transaction ID in the ObjectStore. By using idempotent message validator, subsequent attempts at processing of the transaction can be blocked, thus assuring single processing of a transaction.
- Counter tracking: Object store can be used to track a counter that increments with each event like process retry or redelivery. Once a threshold is reached, a different flow can be triggered to perform corrective actions.
- Caching: Use object store to cache data with a defined expiry time. The cache can be restored with the most recent value once the original cached values expire.
Anti-pattern
- Queueing System: Object Store should not be used to store data required for asynchronous communication between two separate processes. It will lead to complexity with error handling and complicated logic to implement reliability and single-delivery processing.
- Data persistence layer: While technically possible, Object Store should not be used to persist long-term storage of key, value pair. It’s not designed to handle aspects like data lifecycle management, data longevity and audit trail.
2. VM queue
Mule applications provide inter-app communication through asynchronous VM queues accessible via VM Connector. The queues can be transient or persistent; transient queues provide performance at the expense of reliability while persistent queues provide a tradeoff in performance at the expense of reliability. The universal queue stores the data as a serialized data object, so it can only accept serializable objects.
Usage pattern
- Reliability pattern: VM queue can be used where reliability is required even though a message is received from a non-transactional application. The source application is concerned with acceptance of the message and not necessarily successful processing. In such cases, a receiver (sub)flow persists the message in the VM queue and sends an acceptance to the source application. Another (sub)flow picks the message for continued processing. In case of error, appropriate actions can be designed to handle the error in the second sub(flow).
3. CloudHub Persistent queue
CloudHub Persistent queue is an extension of VM Queue that allows messages to be stored external to the Mule application. This allows for key benefits like high-availability and workload distribution across multi-worker deployment.
Usage pattern
- Inter-worker communication: When an application is deployed across multiple workers, use persistent queues for communication between the workers. This allows for optimized processing like slicing a large payload for processing by multiple workers.
Anti-pattern
- Single-worker reliability pattern: Using CloudHub Persistent Queue for reliability pattern in a single-worker deployment will result in performance degradation as data will be persisted outside of the application to a network attached storage device. Instead, an enterprise grade message broker like Anypoint MQ, ActiveMQ, or others should be used.
4. Anypoint MQ
Anypoint MQ provides a comprehensive and scalable cloud-based messaging service to enable advanced asynchronous communication between applications. Non-Mule applications can also connect with Anypoint MQ via REST APIs.
Usage pattern(s)
- Publish-Subscribe: Pub-Sub is a common integration pattern to handle asynchronous communication between a combination of producers and consumers. Anypoint MQ can also be used for fan-in (multiple producers and one consumer) or fan-out (one producer and multiple consumers) patterns.
- Queue-based: Use Anypoint MQ to integrate using a queue based pattern. This will enable decoupled communication between applications. With proper configuration, queues can be set up to guarantee message ordering and one-time delivery.
Anti-pattern
- Short-term data persistence: The platform provides the option to retrieve and recognize messages to delete from the queue. In case of no acknowledgement, the message is put back in the queue for other consumers. This feature should not be used to selectively read the message and put it back on queue. This will negatively impact processing time and unnecessarily create message read/no-ack cycles.
Conclusion
As described above, each of the short-term temporary data storage options (Object Store, VM Queue, CloudHub Persistent Queues, and Anypoint MQ) have specific functionalities, and they cannot be used interchangeably. Proper understanding of each component and clear grasp of requirements is key to determine the right tool for the right use case.
For more information on these components, learn more about MuleSoft’s Anypoint Platform.