Apache Cassandra is a column-based, distributed NoSQL database. Until recently the only way to interact with Cassandra databases from Mule was to reuse one of the existing Java clients, like Hector or Astyanax, in a component. Mule’s Cassandra DB Module now provides message processors to insert, update, query and delete data in Cassandra.
To show off some of the features of the Cassandra module I’ll show how to implement a simple account management API. This API will allow clients to perform CRUD operations on accounts, behaving similarly to something like an LDAP directory.
The Cassandra Module uses Java maps as the mechanism to define how data is inserted and retrieved from a Cassandra key space. For this example we’ll use Mule’s JSON transformers to move data back and forth via HTTP. Let’s take a look at what the account data looks like.
When we persist this JSON to Cassandra the column family will be “Accounts”, each organizational unit will be a row key (ie, “Engineering” and “Operations”) and the account data like the username, password and time since the last password change will be contained in a super column.
Let’s configure the Mule flow to persist this data via HTTP.
This flow will accept the JSON account data we just saw over HTTP, transform it to a Map, use the Cassandra connector’s “insert” message processor to persist the data and then return the payload back to JSON to return to the client.
One of the benefits, as well as challenges, with Cassandra is that all data is stored as byte arrays. This makes it extremely flexible in terms of data storage but also means that type information is lost. The Cassandra module makes use of Hector’s serializers to let you specify how data is transformed when pulling data out of a column.
Let’s take a look at how this works by specifying two query operations for the API. The first will allow us to query for a user based on email address – which you’ll recall maps to the row key.
We’re using the Mule Expression Language to parse the URI. This is how we infer the columnPath and rowKey. In this case the columnPath will be “operators” and the rowKey will be “[email protected]”. We can query for Bill’s account now as follows:
There’s one problem though. When the response comes back it looks like this:
The password age is a string instead of an integer. This is because the Cassandra Module defaults to string serialization unless an explicit column-serializer is defined. Let’s add one to fix the flow.
Now when we refresh the URL something like this should appear:
Column serialization is available for all data types supported by Hector.
The Cassandra Module additionally allows you to query by column slice. The following flow will return all accounts for a given organizational unit (row key):
This will return up to 100 columns from the supplied row. For instance this URL: http://localhost:8081/account/list/operations Will return something as follows:
Deleting columns is just as easy. The following flow demonstrates how to remove a column from a row:
So to delete Bill’s account we’d use a URL as follows:
Summary and What’s Next
Cassandra is a powerful contender in the NoSQL landscape. It’s particularly suited for large dat sets that need to span multiple datacenters. Some features we’re hoping to add to the module, and cover here, are support for a Cassandra backed Mule object store as well as support for CQL as an alternative query mechanism.