Feed my inbox; reading RSS feeds with Mule ESB – Part 2

February 24 2011

1 comment. 0
motif

In my last blog post I showed a simple flow to retrieve an RSS feed periodically, split it and send each RSS entry via eMail. The solution has one major drawback, though: once the Mule application is restarted, Mule has forgotten which feed entries have already been sent. The RSS feed is retrieved again and another bunch of eMails is sent.

Adding idempotency

The standard EAI pattern for receiving messages only once is the idempotent receiver. Mule’s implementation for this pattern is called idempotent-message-filter. To use it in our flow it goes right after the RSS splitter so that any duplicate RSS entries are filtered out.

Unfortunately, adding the idempotent message filter alone won’t help as it still keeps the identifiers of the messages it has seen in memory. To be really useful we need to add persistence to the filter:

<idempotent-message-filter idExpression="#[groovy:feed.uri]">
    <simple-text-file-store name="rss2mail-store" directory="${java.io.tmpdir}/rss2mail" maxEntries="1000"/>
</idempotent-message-filter>

The file to persist the state of the idempotent message filter will be called rss2mail-store.dat and it resists in a subdirectory called rss2mail of your temp directory.

When you restart this flow a couple of times you’ll notice that duplicate emails will still be delivered to your INBOX. To find out why we need to dig a bit deeper into the internals of the simple text file store that’s used to persist the state of the idempotent message filter.

The simple-text-file-store element is mapped to the TextFileObjectStore class. If you look at the way this class implements persisting its state you’ll notice that it writes out Strings in a format that is supposed to be loaded back into memory by using a Java Properties object.

We used the URI of each feed entry to uniquely identify it in the idempotent message filter. Feed entry URLs look like this:

Do you notice the ‘=’ character in the URL? When TextFileObjectStore persists the list of known IDs the rss2mail-store.dat is full of lines that look like this:

Reading the file back into memory using a Properties object it will use the ‘=’ character as separator between key and value. Because the URL itself already contains a ‘=’ we end up with a Properties object with a single key. So when the feed is retrieved the next time only one entry will be filtered out, the others will be sent as email again.

This problem allows me to demonstrate how to create a custom expression evaluator to generate a hash value of the entry’s URI that is suitable be stored in the simple text file store.

Hashing the URI

The hash of the entry’s URI could be computed in a custom transformer. But that would be a solution that’s rather specific to the problem at hand and not very reuseable. Let’s implement a custom expression evaluator that can be used beyond the scope of this little project.

The core of the expression evaluator looks like this:

This expression evaluator is a bit recursive: it runs the expression it gets through Mule’s expression evaulator again and creates an MD5 hash on the result.

Now we can use our MD5 expression evaluator to add a feed.guid message property:

and this message property can now be used in the idempotent message filter:

Hooray again!

Now we can restart the service as many times as we like: as long as no new feed entries are available, no mails will be sent.

I have added the project as version 2 to the github repository so you can download and try it out locally.


We'd love to hear your opinion on this post


One Response to “Feed my inbox; reading RSS feeds with Mule ESB – Part 2”

  1. […] RSS feeds with Mule ESB – Part 3 Dirk Olmes on Wednesday, March 23, 2011 No Comments In part 2 of this mini-series I showed a flow that retrieves an RSS feed periodically, splits it and sends each RSS entry via […]