Getting Started with DataWeave: Part 2

September 10 2015

11 comments 0
Getting started with dataweave part 2

In the Getting Started with DataWeave: Part 1, we introduced you to DataWeave and its canonical format, the result of every you execute in the language. We now continue to explore our new engine, aiming to give you enough grounding to tackle real-world use-cases.

As we did in Part 1, we will continue to show the results of each expression in the DataWeave canonical format.

This series is now complete here:

Expressions

Your entire transformation is encapsulated in a single expression. In Part 1, we discussed writing semi-literal expressions to define the outer structure of your transformation as either an object or an array. Inside this expression, you can write other expressions of different types. There are 7 basic expression types in DataWeave:

  • Literal
  • Variable reference
  • Semi-literal
  • Selector
  • Function call
  • Flow Invocation
  • Compound

We covered literal, semi-literal and variable reference expressions in Part 1. In this post, we concentrate on Selector Expressions and Compound Expressions and refer you to our documentation for a complete coverage of these and every other expression types.

Selector Expressions

Selector Expressions are necessary for just about every transformation. They allow us to navigate to any part of the incoming data whether it be in the , variables or properties. You should bear in mind two things when utilizing expressions: their context and their result. They can be appended to each other to form a chain of selectors. The result of each in the chain sets the context (object or array) against which the next is evaluated. The first context will typically be the result of a variable reference expression, payload for example. However, any expression can set the proper context. Selectors only make sense when applied to objects or arrays, so you simply need to ensure that the context you set is an object or an array. Invoking a selector expression on a simple type will always result in null. (Strings are the exception to this. They are treated as arrays.)

Array Element Selector Expressions

Arrays, as you would expect, are indexable with the usual [0..n-1] notation. We also allow selection of ranges within the array. For your convenience, the indices can be negative, where -1 indicates the last element on the array. The range beginning with the second element and ending with the last element on array x would be indexed x[1..-1]. To retrieve the third last element to the first in reverse order you need to write x[-3..0].

Object Selector Expressions

Single Key Selector

We have seen that DataWeave objects are sequences of key:value pairs in which the keys can be repeated. Many use cases will require you to retrieve particular values from deep within the object. To do this you will need to use the key selector, written in the form .<key-name>. The result is the value corresponding to the first instance of the key you specified.

Let’s take a look at our weather data XML document again and extract just the contents of the forecast and excluding the location. Bear in mind what we learned about variable references. The payload expression normalizes the input XML document into a DataWeave object whose key:value pairs correspond to the elements and their respective contents.

Screen Shot 2015-09-05 at 12.25.14 PM

Note how we can chain the selectors together to navigate to the key of interest, in this case, forecast. See how the result is the value corresponding to that key. Hence, the forecast key does not appear in the result of the expression. Consider the following table of the results of navigating further into the object with payload.weatherdata.forecast as the initial context.

Screen Shot 2015-09-05 at 12.42.10 PM

Multi-key Selector

Note how the .time selector results in the value of the first instance of time in the initial context object. However, what if we wanted both values? To retrieve the value for each repeating key, you must use the multi-key .*<key-name> expression. This will always result in an array of the values, even if there were only one instance of the key. Hence, the expression payload.weatherdata.forecast.*time will result in an array containing the values for each time key instance in the order in which they appear in the context object.

An important point to make here: when the context of your key selector expression is an array of objects, both the single-key and multi-key selectors will iterate and apply against each object in the array and the result is always an array. Consider the following table of expressions where we again use payload.weatherdata.forecast as the initial context:

Screen Shot 2015-09-05 at 1.05.34 PM

Attribute Selector

You may have noticed that the attributes present on the time keys did not appear in the results of any of the above expressions. The key selector expressions only return the value corresponding to the key. When you need to retrieve a particular attribute, you should use the .@<attribute-name> attribute selector. Hence, the value of the from attribute on the first time instance above is retrieved with payload.weatherdata.forecast.time.@from. DataWeave provides a handy shortcut to get all the attributes as an object of key:value pairs. payload.weatherdata.forecast.time.@ will return both the from and to attributes wrapped in an object:

Screen Shot 2015-09-06 at 10.59.56 AM

Compound Expressions

Thus far you have seen some of the basic building blocks you will use in a DataWeave transformation. You will enjoy the real power of the language when you combine these expressions together using operators. We explore some of the most important of these next.

Iteration

Let’s continue to work on our example weather forecast XML document and transform it so that we get some basic information from it. We are interested in the hour for each of the forecasts and a human readable summary of the wind and the temperature. Of course, there are many forecasts for the day, and we need to iterate through each one. We use the map operator for iteration. It takes as operands an expression which must return an array or an object on the left-hand side and any expression on the right. The result of applying map is an array where each element is the result of the right-hand operand. If the left-hand operand is an array, map will iterate through each element and add the result of the right-hand operand to the output array. If the left-hand operand is an object, map will iterate on the sequence of key:value pairs.
Let’s say we wish to build a forecasts array containing objects with 3 fields: hour for the time of the forecast, wind for a description of the wind conditions and temp for a description of the temperature.
Featured image Dataweave2
A couple of things to note here:

  1. Using (w=payload.weatherdata) is a local variable declaration prepended to the object expression which defines our entire transformation. This variable, w, is considered local to the object expression to which the declaration is prepended. Hence, it is only valid to reference w within the scope of this object.
  2. Map will iterate on each element in the array returned by .*time and add the object defined as its right operand to the resulting array.
  3. $ is an alias for the element found at each iteration on the array.
  4. $.@from is the selector expression used to access the value of the from attribute.
  5. as :datetime is a type-cast expression. The .hour expression can thus be used to extract the hour from the date and time.
  6. For simple string concatenation, we use the ++ operator.

Filtering on Iteration

We are free to chain expressions together as compound expressions with any number of operators. Often we need to filter the data we work against before or after the operator of choice. The filter operator iterates through elements in an array or keys in an object and produces an array which contains only those elements which match the criteria specified by its right-hand boolean operand.
Let’s say we want to filter the array produced by map above so that we only get those forecasts after six o’clock pm:

Screen Shot 2015-09-05 at 2.39.04 PM
Note how the criteria expressed in the right-hand operand makes reference to $.hour. This key was not present in the original input. It is important to be mindful of the results of each expression in the chain of expressions that form a compound expression. The first expression which utilizes the map operator produces an array of objects with hour, wind and temp keys. This array becomes the left-hand operand of the filter operator, which iterates through the array and produces an array of objects filtering out those objects which fail the said criteria.

Conditional Logic

Often our transformation logic needs to output data only when we meet certain criteria. Let’s output every forecast but only include the wind description if the speed is greater than five miles per hour.
Screen Shot 2015-09-05 at 2.05.52 PM

Note how we surround the entire wind key:value pair in parentheses. This is the left-hand operand to the when operator. The right-hand operand is a boolean expression. Only when this evaluates to true, is the wind key:value pair included in the output.

Next Steps

That’s it! You’ve just mastered the essentials to utilize DataWeave in every transformation requirement from simple to complex. In our next post, Getting Started with DataWeave: Part 3, we’ll guide you through a real-world scenario of transforming between Java Database result sets, XML and JSON payloads as you expose data through System and Experience APIs.

Also, you can now view our webinar on demand that introduces Dataweave.


 


We'd love to hear your opinion on this post

11 Responses to “Getting Started with DataWeave: Part 2”

  1. Hi Nial,
    coming to you after lotta mind boggling. here is the case:
    I have an incoming string lets call it payload.dateTimestamp and I want to convert it into a datetime type so that I can extract day, month and year separately. I tried
    payload.dateTimestamp as :datetime but getting warning “cannot coerce a :string to a :datetime”. Any help will really save days for me.

    thanks,
    Deepak

    Agree(0)Disagree(0)Comment
  2. I like this very much for the cases of small transformational payloads in cloud hub deployments.However I see this would become little cumbersome for on premise integrations more specifically SOAP to SOAP transformations .Typical example -retrieving Customer Account for subscription based products & services, response Soap response transformed to canonical format Soap response in the ESB.In the above scenario the response would be 500 fields.I guess I would need to map it source to target manually in the DataWeave editor and that could be more of an effort when compared to DataMapper.I would highly encourage having a graphical tool to map the fields in conjunction with DataWeave & that would increase the productivity and of course DataWeave is high performing.

    Agree(0)Disagree(0)Comment
    • Hi Vamshi,

      We will have a graphical release next week, but graphics aside, I am sure DataWeave is compelling even when there are many fields because we have the facility to process many of them automatically depending on the use case. Could you post an example for the scenario you described?
      thanks,
      Nial

      Agree(0)Disagree(0)Comment
  3. How to sort list ascending or descending order using Dataweave orderBy operator. I can’t see any option in orderBy operator to specify ascending or descending order. Without this option orderBy operator is of no use.

    Agree(0)Disagree(0)Comment
    • ([{“name”: “Mickey Mouse”, “age” : 87}, { “name”: “Donald Duck”, “age”: 80 } ] orderBy $.name)[-1..0]

      Agree(1)Disagree(0)Comment
  4. Hi Nial,
    These blogs are extremely helpful. Now I am working on complex (for me anyhow) mappings with calculated outputs or derived (as in the $.hour example above).

    However, thinking of the following mapping (from CSV).
    o CSV does not have a header (and will not in real life) but I added one for experimentation
    o Why can’t I use [n] in the ‘delete’: element, I am forced to use a header name.
    o How can I have a function that is called from the lower inner filtered map (dependents) that has parameters that are taken from that inner filter as well as the outer map (e) – see the call to getPtc.
    o Why isn’t the conditional output (dependents dob) acceptable syntax?

    %dw 1.0
    %output application/json
    %function getDate(aString) aString[0..3] ++ ‘-‘ ++ aString[4..5] ++ ‘-‘ ++ aString[6..7]
    %function getSalut(gender, ptc) ‘MR’ when gender == ‘M’ otherwise ‘MS’ when gender == ‘F’ otherwise ‘CHD’

    using (k = payload filter $.action == ‘K’, e=payload filter $[0] != ‘K’ groupBy $[1]) {
    delete: (k.staff),
    update:
    (e map {
    ptc: “ZEA”,
    staff: $[0][2],
    id: $[0][1],
    last: upper $[0][3],
    first: upper $[0][4],
    gender: upper $[0][5],
    hireDate: getDate($[0][6]),
    seniorityDate: getDate($[0][7]),
    terminationDate: getDate($[0][8]),
    currency: $[0][9],
    country: upper $[0][10],
    dob: getDate($[0][11]),
    status: $[0][12],
    contact: {
    email: lower $[0][13],
    phone: $[0][14]
    },
    dependents:
    ($ filter $[0] == ‘D’ map {
    relatedId: $[1],
    relatedStaff: $[2],
    dId: $[3],
    rel: upper $[4],
    gender: upper $[5],
    myPtc: getPtc(rel,gender,parent.gender),
    lastName: upper $[6],
    firstName: upper $[7],
    (dob: getDate($[8])) when $.rel == ‘CHILD’,
    salut: getSalut(upper $[5],$myPtc)
    })
    })
    }

    Thank you very much for taking a look.

    Agree(0)Disagree(0)Comment
  5. Hi Nial,
    I did post a comment here and it disappeared, so here is a repeat.

    Your Dataweave blog has been very informative and helpful. I am working on more complex mapping ideas that you might shed some light on that I cannot fathom – just yet.

    I have a CSV input (no header, but I added one to help out with some issues in the mapping) that is essentially a collection of 3 row/record types. One, I call ‘K’ to map to a simple array list, the others are a parent/child arrangement, I call ‘E’ (the parent) and ‘D’ (the dependent).

    I can have many K’s and many collections of an ‘E’ with zero or more ’D’ rows following.

    The mapping I have works ok such that it is, though with a few issues and questions.

    1. In the k filter and delete: element, why does it only work using a header identifier and not a column indicator ([2], for example)?

    2. Why isn’t the conditional output acceptable syntax to dw on the dependent dob element.

    3. How can the lower nested filter map (dependents) access an attribute from the upper parent map? (I need to call a dw function from the nested filter map that combines an attribute from the nested map AND an attribute from the upper map (e). For pseudo e.g. see the call to getPtc function.

    Here is a sample input (with header).

    action|reference|staff|last|first|gender|startDate|seniorityDate|terminationDate|currency|country|dob|status|email|phone
    E|39911|20803058|Bloggs|Joe|M|20090602|20090602|99991231|USD|USA|[email protected]|1 555 777 1234
    D|39911|20803058|30016|Child|F|Wonderland|Alison|1993101300000000||||||
    D|39911|20803058|30021|Spouse|F|Carrot|Henrietta|1967092000000000||||||
    D|39911|20803058|30014|Child|F|Delune|Clare|1990080300000000||||||
    D|39911|20803058|30015|Child|M|Lips|Phil|1988092600000000||||||
    D|39911|20803058|30017|Child|F|Green|Theresa|1997080100000000||||||
    K|2015113000000000|00914274||||||||||||
    K|2015123100000000|00874276||||||||||||
    K|2015123100000000|00912183||||||||||||

    Here is a sample mapping.

    %dw 1.0
    %output application/json
    %function getDate(aString) aString[0..3] ++ ‘-‘ ++ aString[4..5] ++ ‘-‘ ++ aString[6..7]
    %function getSalut(gender, ptc) ‘MR’ when gender == ‘M’ otherwise ‘MS’ when gender == ‘F’ otherwise ‘CHD’

    using (k = payload filter $[0] == ‘K’, e=payload filter $[0] != ‘K’ groupBy $[1]) {
    delete: (k.staff),
    update:
    (e map {
    ptc: “ZEA”,
    staff: $[0][2],
    id: $[0][1],
    last: upper $[0][3],
    first: upper $[0][4],
    gender: upper $[0][5],
    hireDate: getDate($[0][6]),
    seniorityDate: getDate($[0][7]),
    terminationDate: getDate($[0][8]),
    currency: $[0][9],
    country: upper $[0][10],
    dob: getDate($[0][11]),
    status: $[0][12],
    contact: {
    email: lower $[0][13],
    phone: $[0][14]
    },
    dependents:
    ($ filter $[0] == ‘D’ map {
    relatedId: $[1],
    relatedStaff: $[2],
    dId: $[3],
    rel: upper $[4],
    gender: upper $[5],
    lastName: upper $[6],
    firstName: upper $[7],
    (dob: getDate($[8])) where $.rel == ‘CHILD’,
    myPtc: getPtc($.gender,$.gender_of_the_mapped_item_in_e-map),
    salut: getSalut($.gender,$.myPtc)
    })
    })
    }

    Thank you very much for taking a look.
    Stuart.

    Agree(0)Disagree(0)Comment
    • Hi Stuart,
      Let me take a look at this when I get a moment. I’ll get back to you.
      Nial

      Agree(0)Disagree(0)Comment
      • Actually, now I think about the aspect of accessing an attribute of a parent record from within the lowest nested map, it may be possible when mapping parent attributes, rather than to reference record attributes directly, but simply replace that with a mapping that calls a global function instead (simply setting the result on a one for one basis) that also has the side effect of setting a flowVar at the same time, that can be referenced in that lowest nested map statement.

        Agree(0)Disagree(0)Comment
        • Hi Stuart,
          There is a configuration in Studio now for declaring that you have no header on your CSV and that the separator is ‘|’

          Here is my transform:

          %dw 1.0
          %output application/json
          %function toDate(aString) aString[0..3] ++ '-' ++ aString[4..5] ++ '-' ++ aString[6..7]
          %var deletes=payload filter $[0] == 'K'
          %var updates=payload filter $[0] != 'K'
          ---
          {
            delete: deletes map $[2],
            update: updates groupBy $[1] map using(parent=$[0]) {
              ptc: 'ZEA',
              staff: parent[1],
              id: parent[2],
              last: upper parent[3],
              first: upper parent[4],
              gender: upper parent[5],
              hireDate: toDate(parent[6]),
              seniorityDate: toDate(parent[7]),
              terminationDate: toDate(parent[8]),
              currency: parent[9],
              country: parent[10],
              dob: toDate(parent[11]),
              status: parent[12],
              contact: {
                email: parent[13],
                phone: parent[14]
              },
              dependents: $[1..-1] map {
                relatedId: $[1],
                relatedStaff: $[2],
                dId: $[3],
                rel: $[4],
                gender: upper $[5],
                lastName: upper $[6],
                firstName: upper $[7],
                (dob: toDate($[8])) when $[4] == 'Child',
                myPtc: '',
                salut: $[5] match {
                  'M' -> 'MR',
                  'F' -> 'MS',
                  default -> 'CHD'
                }
              }
            }
          }

          I think the ‘no header’ config answers your first question.
          For conditional output you need to use the ‘when’ operator or the new ‘match’ operator (see my salutation expression).
          Your third question is answered by the ‘using’ operator which assigns variables before any expression. Note how I do this when I assign the value to the ‘parent’ variable.

          Nial

          Agree(3)Disagree(0)Comment