In today’s Training Talks, we’ll learn how to extract, transform, and load data (ETL) using Anypoint Design Center’s flow designer.
In this demo, Mark will start by creating a new application, he will take data from a database that he will transform using DataWeave. After he transforms the data, he will load the data into another database using flow designer. Let’s get started!
To start, let’s log in to anypoint.mulesoft.com. Then let’s click on “Design Center.” For the first step, we’re going to create a new application and then following the below steps:
- We’re going to name this application “ETL App.”
- Our first step is to actually just rename this flow. We’re going to rename this flow “ETL.”
- In the trigger, we’re going to add a HTTP listener. The path will be ETL.
- In advanced, we’re going to only allow the GET method. We’re going to limit this endpoint to only respond to HTTP GET methods.
The next step is to add a database component because our first step is to actually extract the data.
- We’re going to click the “select” operation. Then we need to enter in our credentials.
- We need to set up the database. We have a couple of options here in flow designer in this current version. Our database is a MySQL server, so we will click “MySQL connection”. We also need to provide a JDBC driver, or a Java driver for this database.
- Since I already uploaded this driver, I’m going to click and choose my Java connector, JDBC driver. Then I’m going to enter in my credentials to pull data from this database.
- I’m going to test this database. The connection is successful. Then I’m going to enter in my SQL query.
My next step is to actually transform the data because since I’ve already extracted the data, now I need to transform it. I’m going to add in a transform component. As you can see here in flow designer, we can already see some metadata that flow designer has been able to provide for us. That’s a good thing for us to make sure that we’re getting the right fields for us to transform our data.
Our next step is to set the data type for our output.
- We’re going to create a new data type called “Accounts Payload.”
- Then we’re going to choose “object,” because it’s going to be a Java object
- Then I’m going to add in my RAML specifications to outline the data type I want to transform the data into.
- I click “apply changes,” and then I’m going to go ahead and drag and drop from the left-hand side to the right-hand side that corresponds to the fields that I want to transform.
My next step is to set some sample data so I can actually preview my live transformation.
- I’m going to paste some sample data that I have already. Notice that here I need to capitalize state and country, or uppercase in this case. I need to uppercase everything.
- I’m going to double-click on the state field and then enter in the upper function.
- Then I’m going to double-click the country and enter and use the upper function on country as well.
- As you can see here, I’ve been able to uppercase state and country and I’ve been able to preview it live. I’m done with my transform component.
My next step is to insert this transformed data into another database.
- I’m going to choose “bulk insert.”
- Now I need to go ahead and change my database configuration because I want to insert into a completely different database. I’m going to click on “add”, name this database configuration to “Database 2 insert.”
- It’s another MySQL database. I will enter in the credentials for the other database here.
- I’m going to test. Everything looks okay. Then I’m going to add in a SQL insert statement using parameter values. As you can see here, insert into the following fields with the values that I’m taking in from my current payload, or my current transformation.
Once I’m done, that is it! I’ve been able to extract, transform, and load. Now I will need to deploy my application to a server that’s running in the cloud so I can conduct live testing. Changes have been deployed. Now I need to access the address for this application. How do I do that?
- I click on this three-button icon and I select “copy link.”
- I add in a new tab here.
- Then I will enter my endpoint here.
Everything looks fine! Notice here that I’ve downloaded an ETL file. The reason why I’ve downloaded a file is that the bulk insert operation actually returns a payload itself. It’s a payload of an array consisting of zeros and ones. One represents a row that has been inserted successfully, and zero represents a row that has not been inserted successfully.
That’s what I get at the end because, in my application, I’m not actually handling that message, and therefore, it just returns that Java object, that array, to my web browser. Then, since it’s a byte object, it’s not a text file, the browser automatically just downloads this file for me.
Once the operation has been successful, let’s go ahead and switch to my terminal.
- I’m already logged into MySQL server, the one that I’m inserting my data into.
- Then I’m going to just go ahead and enter in my select query to check whether my data has been inserted into this table.
Yes, I’ve confirmed that the application does work. It’s been able to extract, transform that data, and load that data into my new database here.
In summary, doing extract, transform, and load is as simple as using three components in flow designer. First, you use a database to select or retrieve the data. Second, you transform that data using a transform component. Third, you insert that transformed data into another database using the insert operation.