NoSQL and Big Data connectors for Mule

September 19 2013

0 comments 0

In the past few months, you may have noticed that we have regularly announced the release of new Mule connectors for data-stores. Two main forces are at play behind the need for these types of data-stores:

  • – The need to deal in realtime or near-realtime with the vast amounts of data “web-scale” applications can generate,
  • BASE vs ACID – The need to scale reliably in the unreliable environment that is the cloud leading to the relaxation of RDBM’s ACID properties (Atomicity, Consistency, Isolation and Durability) towards BASE ones (Basically Available, Soft state, Eventually consistent).

So where is Mule coming into play in this equation you might ask?

Mule can help integrating such NoSQL data-stores with the resources that produce and consume data. This integration goes way beyond than simply establishing protocol connectivity: thanks to Mule queuing, routing and transformation infrastructure, important tasks like data capture and curation can be achieved. Mule can also be used to expose APIs that make either raw data or processed data available for use in custom applications.

The CAP triangle

In his “Visual Guide to NoSQL Systems“, Nathan Hurst used the “CAP triangle” to visually categorize some of the available data-stores based on their design decisions regarding the eponymous theorem. We’re going to use the same approach to visually present the Mule NoSQL connectors but, since we said “CAP”, we should quickly summarize what this conjecture is about. Introduced in late 2000 by Eric Brewer this theorem states that “it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:

  •     Consistency (all nodes see the same data at the same time)
  •     Availability (a guarantee that every request receives a response about whether it was successful or failed)
  •     Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)” (Wikipedia)

Note that in early 2012, Brewer (and others) have revisited the CAP theorem, nuancing it based on the experience gathered by the industry during the previous 12 years. One important caveat added to the “pick only two” principle was that “partition tolerance” is not an all-or-nothing option and that the consistency/latency trade-off should not be ignored.

So without further ado, here is what Mule has to offer in term of NoSQL connectivity:

You will notice that Mule’s NoSQL connectors:

  • cover the full spectrum of CAP trade-offs,
  • support data structure (Redis), document (Riak, MongoDB), column (Cassandra), graph (Neo4J) and raw data (HDFS) oriented stores,
  • include BigData-ready stores (Neo4j, Riak, Cassandra, MongoDB and HDFS)
All of these connectors are available through the Mule Studio update site and you can find more documentation here (Cassandra, HDFS, MongoDB, Neo4j, Redis, Riak )

More to come

Watch this space for an upcoming white paper that will explore some of the use cases these connectors will allow you to accomplish.

In the meantime, your comments are welcome: if do have stories with Mule and NoSQL or Big Data to share, we would love to hear them!


We'd love to hear your opinion on this post