Understanding “big-O complexity” in DataWeave

Sometimes developers face optimization challenges in their code — regardless of the programming language they use. One of the most common optimization challenges is the complexity (or performance) of an algorithm that increases and grows infinitely when the number of arguments it has to process increases. This is called big-O complexity or big-O notation.

In this blog post, you’ll learn how to perform DataWeave code optimization to improve performance by following the big-O notation principles.

Common DataWeave enrichment

Let’s assume we have to combine two data sets to generate the final result which we want to be returned by the message transformer event processor.

The first data set has been received from the backend with a collection of possible flight destinations.

 Fig 1. Collection of possible destinations

Next to this data, we have a static collection of details about all airports based on their IATA (International Air Transport Association) code which uniquely defines all the airport’s details: full airport name, location, coordinates, etc.

openFlightsAirportId492499502
airportNameLondon Luton AirportJersey AirportLondon Gatwick Airport
cityLondonJerseyLondon
countryUnited KingdomJerseyUnited Kingdom
IATALTNJERLGW
ICAOEGGWEGJJEGKK
longitude51.8746986449.20790151.14810181
latitudeAltitude-0.368333012-2.195509911-0.190277994
timeZone526277202
DST000
timeZoneEEE
typeEurope/LondonEurope/JerseyEurope/London
sourceairportairportairport
Fig 2. IATA details

Our task is to add IATA details to every flight received from the backend and generate a final JSON response.

Classic DataWeave approach

The classical approach would be to store the entire IATA collection in a variable and insert the corresponding airport information field into the resulting collection using standard DataWeave. This means using the map operator to iterate over the collection of records and add an extra key:value pair to every flight object where the value is the corresponding IATA details. The find operator is used to locate the proper entry in the IATA collection.

Fig 3: Classic approach to mapping data

This fragment of code reads the IATA information into the variable airportInfo,then using the map operator it iteratively traverses the payload, which is a collection of flights (see fig 1), and adds a new key:value pair to the end of every flight. The object is taken from the airportInfo variable where filter is used to locate an IATA field matching the destination field from the flight object.  

The output of this code we will be integrated flight and airport data:

Fig 4. Merged flights and airport data

The issue with the classic approach

This transformation looks straight forward, but let’s think about the operational effort which is needed to process it:

  • The map operator iterates through the collection and a new key:value pair added to every object where the find operator has to be executed.
  • Then the find operator performs a lookup in the airportInfo collection and iterates through every field of every object to locate an IATA key matching the destination key.

So, every time we are looking through the airportInfo variable we have to perform a full scan of every key:value pair to detect a matching object. In this case, the number of iterations will be (n * m) where n is a number of flight objects and m is an amount of key:value pairs in the airportInfo variable. The Big-O notation comes into play and shows us how complexity will increase with the number of objects. The complexity increase will be quadratic O(n2).

Optimized DataWeave approach

DataWeave has the capability to improve the performance of this iterative process significantly by using a functional programming approach where one of the concepts gives the variable similar functionality as a function. Within variable creation, a functional definition can be used.

Fig 5: Group the airport information be IATA code

Output from this code would be as shown below:

Fig 6. Airport info grouped by IATA

Now we have all objects grouped by the IATA code, thus the IATA code acts as a unique identifier.

After modifications, our final code will be the following:

Fig 7. Optimized approach

Certainly, there will not be any changes to the final output results however, complexity will be heavily reduced as we are not performing any global lookups and can select top-level IATA keys from the airportInfo variable which becomes hashmaps after groping. It makes our complexity O(n) where n is the number of flight objects.

Conclusion

In this blog post, we’ve seen just one of many optimization examples which can be coded with DataWeave.

The graph below shows different dependencies between big-O complexity and execution time.

Fig 8. Big-O complexity performance graph

We managed to reduce complexity from O(n2) down to O(n) which brings a significant performance improvement. 

There are many more best practices demonstrated in the MuleSoft DataWeave class available online or in the classroom. Please check our training website for availability and information about our developer and architecture classes. 



We'd love to hear your opinion on this post