So far in this 3-part series, we have looked at variables (Part 1) and functions (Part 2) in order to leverage them to our advantage. In this third and final part of the real-world DataWeave series, we will look at another common problem area, that of performing nested loops in data structures.
This case seems to come up regularly when dealing with HR systems that return a list of employees. Each employee item in the list has some attributes and often has a list of job records attached to the employee if they have held multiple positions at the organization.
Typically I see HR systems returning data in an XML format, but anytime we deal with data transformation in DataWeave we want to make sure we are thinking in terms of the object structure and not to concern ourselves with the format specifics.
Now, when we take a list of items as a part of a payload input we can use the syntax:
%dw 1.0 %output application/json --- payload map { data: $.value }
In these instances, the “$” is a pointer to the current item in our list and we can reference any fields attached to it. When we start working with nested lists, things get complicated if we just rely on this label to access values
A more human-friendly way is to label the inputs as follows:
%dw 1.0 %output application/json --- payload map ((input, inputindex) -> { data: input.value })
Our current item in the list is identified by the label “input”, this will make things much easier when we start working with nested lists.
Now onto our scenario, on this occasion we are going to be receiving an XML input with some nested lists that we will want to loop through and use our nicely labeled pointers to the items in the list as we transform them. Our input looks like:
<employees> <employee> <name>Doe, John</name> <dob>06/22/1968</dob> <id>1234</id> <job> <title>VP of Engineering</title> <start_date>08/20/2008</start_date> </job> <job> <title>Engineering Manager</title> <start_date>06/13/2004</start_date> <end_date>08/20/2009</end_date> </job> <job> <title>Engineer</title> <start_date>04/05/1997</start_date> <end_date>06/12/2004</end_date> </job> </employee> </employees>
Our first task is to work on the list of “employee” items, we can use the “map”operator as in our earlier example and we will label our pointer so we know how to use it deeper in the transform. I have labeled the item pointer “e” and the list index as “empindex”:
%dw 1.0 %output application/json --- payload.employees map ((e,empindex) -> { })
Another way to think of this is that we are doing the equivalent of “for each employee in employees”.
Next, we will want to access each job object, this time our DataWeave goes inside the initial map operator:
%dw 1.0 %output application/json --- payload.employees map ((e,empindex) -> { job:(e.*job map ((j,jobindex) -> { })) })
By using “e.*job” as our starting point for our map in this DataWeave, we are doing the equivalent of “for each job in employee” and, of course, we are labeling our item pointer as “j” and our list index “jobindex”.
At this point, I recommend mapping a singular field value and using the DataWeave preview to ensure we have the basic construct correct. In this example, I recommend using the job title as an output field as it is unique––so we can easily see if we have the correct output structure. Our DataWeave script should look like below, note that we are referencing the job title using “j.title” as we had called our item pointer “j”:
%dw 1.0 %output application/json --- payload.employees map ((e,empindex) -> { job:(e.*job map ((j,jobindex) -> { title:j.title })) })
In our data sample, we had three job records for this employee––we should see those same three in the preview, as follows:
[ { "job": [ { "title": "VP of Engineering" }, { "title": "Engineering Manager" }, { "title": "Engineer" } ] } ]
Once we have confirmed we have the correct structure, we can continue mapping the rest of the fields. As we labeled our list item pointers uniquely we can easily reference data fields from anywhere in the input payload using the right pointer. In my output I needed to output a mix of values from the job and employee into a list of job records:
%dw 1.0 %output application/json --- payload.employees map ((e,empindex) -> { job:(e.*job map ((j,jobindex) -> { id : e.id, name: e.name, dob: e.name, (end_date:j.end_date) when (j.end_date?), title:j.title, start_date:j.start_date })) })
In the interest of being tidy with my output JSON, I also used the “when” clause to prevent my output from containing a null “end_date” value for the latest job that the employee currently holds. Once complete this DataWeave now outputs the structure I was after:
[ { "job": [ { "id": "1234", "name": "Doe, John", "dob": "Doe, John", "title": "VP of Engineering", "start_date": "08/20/2008" }, { "id": "1234", "name": "Doe, John", "dob": "Doe, John", "end_date": "08/20/2009", "title": "Engineering Manager", "start_date": "06/13/2004" }, { "id": "1234", "name": "Doe, John", "dob": "Doe, John", "end_date": "06/12/2004", "title": "Engineer", "start_date": "04/05/1997" } ] } ]
In summary, in order to avoid getting caught up in the pointer confusion you have to label the pointers to list items as part of your map definition when using DataWeave. This technique can be applied to any number of nested lists and will make life much simpler when trying to output fields from different inner and outer loop(s).