Using in-memory database to help with flat file integration

Even today, when integrating with critical backend systems (e.g. order fulfillment system and shipping management system), it is quite common to encounter data being transferred in flat file format. Often the records to be transferred will need to be sorted and de-duped before they can be inserted into another system. For example, there could be multiple update entries for the same item and the sequence of the update calls is important. Here we would like to show you an example of how to leverage Mule’s built-in support for in-memory databases to make it really easy to perform such tasks when working with flat files.

Out of the box, Mule supports HSQL, H2, and Apache Derby. The example here will use Apache Derby but the configuration is essentially the same for all three.

First thing you will need to do, is to configure your JDBC connection. Note that the use of Apache Derby JDBC driver, also, where you put the name of the database (e.g. “blogdemo” in this case.):

<spring:bean id="datasource" name="datasource" class="org.enhydra.jdbc.standard.StandardDataSource" destroy-method="shutdown">
     <spring:property name="driverName" value="org.apache.derby.jdbc.EmbeddedDriver"/>
     <spring:property name="url" value="jdbc:derby:memory:blogdemo"/>

In addition, you will need to initialize the database by implementing the Spring InitializingBean interface. When Mule starts, the initialization process will create the tables you plan to use. For our example, we will just create couple tables to illustrate the process:

package org.mule.blogdemo.init;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.Statement;
import org.springframework.beans.factory.InitializingBean;

public class DBInitialization implements InitializingBean {
	public void afterPropertiesSet() throws Exception {
		String dbURL = "jdbc:derby:memory:blogdemo;create=true";
		Connection conn = null;
		try {
			// Get a connection
			conn = DriverManager.getConnection(dbURL);
			Statement stmt = conn.createStatement();
			stmt.executeUpdate("CREATE TABLE table1 (Name VARCHAR(255), External_ID__c VARCHAR(255), Value__c INTEGER, Last_Modified TIMESTAMP)");
			stmt.executeUpdate("CREATE TABLE table2 (somecolumn VARCHAR(20), somecode VARCHAR(80))");
		catch (java.sql.SQLException sqle) {
			throw sqle;

With the corresponding element in your flow:

<spring:bean id="dbinitialization" name="dbinit" class="org.mule.blogdemo.init.DBInitialization"/>

These should be global elements of your application:

From here on, you can just use your in-memory database like any regular relational database in your application:

<jdbc-ee:connector name="apache-derby" dataSource-ref="datasource" validateConnections="false" queryTimeout="-1" pollingFrequency="0" doc:name="Database">
     <jdbc-ee:query key="insertRecord" value="INSERT INTO table1 (Name, External_ID__c, Value__c, Last_Modified) VALUES (#[map-payload:name], #[map-payload:externalid], #[map-payload:value], #[map-payload:lastmodified])"/>
     <jdbc-ee:query key="selectRecord" value="SELECT Name, External_ID__c, Value__c FROM table1 ORDER BY Last_Modified"/>

For example, if your requirement calls for upserting records from a csv input file into in sequence based on last modified time:

order1,1,10,2012-09-22 13:40:00.00
order1,1,1,2012-09-22 10:00:00.00
order1,1,5,2012-09-22 11:00:00.00

With in-memory database, the flow can be simplified down to just a handful of steps:

<flow name="mainFlow" doc:name="mainFlow" processingStrategy="synchronous">
     <file:inbound-endpoint path="src/test/resources" moveToDirectory="src/test/archives" responseTimeout="10000" doc:name="Input File"/>
     <jdbc-ee:csv-to-maps-transformer ignoreFirstRecord="true" doc:name="CSV to Maps" mappingFile="src/main/resources/mules-csv-format.xml"/>
     <foreach doc:name="Foreach">
      	<logger message="#[payload]" level="INFO" doc:name="csv record" />
        <jdbc-ee:outbound-endpoint exchange-pattern="one-way" queryKey="insertRecord" queryTimeout="-1" connector-ref="apache-derby" doc:name="insert-record"/>
     <jdbc-ee:outbound-endpoint exchange-pattern="request-response" queryKey="selectRecord" queryTimeout="-1" connector-ref="apache-derby" doc:name="select records"/>
     <sfdc:upsert config-ref="salesforce" externalIdFieldName="External_ID__c" type="Demo_Object__c" doc:name="">
         <sfdc:objects ref="#[payload]"/>

With the records now stored in a relational database table, all the SQL query functionality can be utilized to process the data to meet your requirements.

Finally, please don’t forget to include Apache Derby library in your project. Support for Derby is out of the box but the driver still needs to be added to the project manually:

We'd love to hear your opinion on this post

11 Responses to “Using in-memory database to help with flat file integration”

  1. When I try this, I get Cannot convert value of type [org.mule.blogdemo.init.DBInitialization] to required type [javax.sql.DataSource] for property ‘dataSource’: no matching editors or conversion strategy found. Can you help out? Thanks!

  2. What version of Mule you’re using? The project for the blog post was done a while back.

  3. Also in my demo project I used “datasource” all lower case. The property name is case sensitive.

  4. Thanks for the replies, Adrian. I’m using Mule 3.4.0. This was a good old-fashioned error between keyboard and chair: I was setting the data source to the dbinitializationbean instead of the datasource bean. Thank you for the quick reply, though!

  5. I’ve got a second question for you, Adrian. If I have to read multiple files, the flow continues to the “select records” endpoint once for each file. Is there a way to get Mule to only reach that point once, even if I have multiple files? Thanks again!

  6. Short answer is yes, there are multiple ways to do it. You should consider breaking the app into 2 separate flows. Assuming you can time when your files are dropped, you can have one flow that insert files into the in-memory db, then another one with a scheduler that kicks off and select the data.

  7. I don’t control when the files will be dropped. Unfortunately, I haven’t figured out how to have a file act as a request-response endpoint so I can tell it to run at a certain time instead of at a certain frequency. I don’t want to process the files with auto-delete because I could see someone seeing the files disappear and putting them in there over and over again, resulting in the data being represented multiple times.

    If I could schedule the insert, I could run a flow to clear the database, another flow to insert the data, and third flow to select and do something with the data.

    • A bit more explanation of the process will be helpful. So how many files will be dropped each time? From my understanding, basically you are expecting a set of file(s), and after that, you want to sort them all together then process them in certain order? The key question is what sort of mechanism you have to let you know, “Hey, the file drops are done for this round, now let’s process”?

  8. Can you share your code in Github or as zip file

  9. Kasim, it’s from a while ago and I’ve lost the entire project. Plus it was built on an older version of Mule. The code snippets are all public & you can see them above though. Thanks.

  10. […] we found /using-in-memory-database-to-help-with-flat-file-integration/, that I'll be controlling for a classification and combining. My wholly remaining doubt is how […]