NoSQL with Apache Cassandra and Mule

May 29 2013

1 comment. 0
motif

Apache Cassandra is a column-based, distributed database.  Until recently the only way to interact with databases from Mule was to reuse one of the existing Java clients, like Hector or Astyanax, in a component.  Mule’s Cassandra DB Module now provides message processors to insert, update, query and delete data in Cassandra.

To show off some of the features of the Cassandra module I’ll show how to implement a simple account management API.  This API will allow clients to perform CRUD operations on accounts, behaving similarly to something like an LDAP directory.

Inserting Columns

The Cassandra Module uses Java maps as the mechanism to define how data is inserted and retrieved from a Cassandra key space.  For this example we’ll use Mule’s JSON transformers to move data back and forth via HTTP.  Let’s take a look at what the account data looks like.

When we persist this JSON to Cassandra the column family will be “Accounts”, each organizational unit will be a row key (ie, “Engineering” and “Operations”) and the account data like the username, password and time since the last password change will be contained in a super column.

Let’s configure the Mule flow to persist this data via HTTP.

This flow will accept the JSON account data we just saw over HTTP, transform it to a Map, use the Cassandra connector’s “insert” message processor to persist the data and then return the payload back to JSON to return to the client.

Column Serialization

One of the benefits, as well as challenges, with Cassandra is that all data is stored as byte arrays.  This makes it extremely flexible in terms of data storage but also means that type information is lost.  The Cassandra module makes use of Hector’s serializers to let you specify how data is transformed when pulling data out of a column.

Let’s take a look at how this works by specifying two query operations for the API.  The first will allow us to query for a user based on email address – which you’ll recall maps to the row key.

We’re using the Mule Expression Language  to parse the URI.  This is how we infer  the columnPath and rowKey.  In this case the columnPath will be “operators” and the rowKey will be “[email protected]”.  We can query for Bill’s account now as follows:

http://localhost:[email protected]

There’s one problem though. When the response comes back it looks like this:

The password age is a string instead of an integer.  This is because the Cassandra Module defaults to string serialization unless an explicit column-serializer is defined.  Let’s add one to  fix the flow.

Now when we refresh the URL something like this should appear:

Column serialization is available for all data types supported by Hector.

Column Slices

The Cassandra Module additionally allows you to query by column slice.  The following flow will return all accounts for a given organizational unit (row key):

This will return up to 100 columns from the supplied row.  For instance this URL: http://localhost:8081/account/list/operations Will return something as follows:

 

Column Deletion

Deleting columns is just as easy.  The following flow demonstrates how to remove a column from a row:

So to delete Bill’s account we’d use a URL as follows:

http://localhost:[email protected]

Summary and What’s Next

Cassandra is a powerful contender in the NoSQL landscape.  It’s particularly suited for large dat sets that need to span  multiple datacenters.  Some features we’re hoping to add to the module, and cover here, are support for a Cassandra backed Mule object store as well as support for CQL as an alternative query mechanism.

 

 

 


We'd love to hear your opinion on this post

One Response to “NoSQL with Apache Cassandra and Mule”

  1. Having multiple issues using the connector via Maven.

    The instructions via the link below seem to be wrong?
    http://mulesoft.github.io/CassandraDB-connector/guide/install

    Installing in Mule Studio under Cloud Connector seems to work via version 1.0 with problems.

    Unable to load single value from column. Issue posted here:
    https://github.com/mulesoft/CassandraDB-connector/issues/2

    No schema documents seem to exist in stand mule location including locations referenced by Mule Studio Plugin.
    http://www.mulesoft.org/schema/mule/cassandradb

    Any ideas on this items? Seems inserts are working correctly. Attempted to pull down source and compile with later version of dev-kit but do not have access.

    Agree(0)Disagree(0)Comment