Reading Time: 18 minutes

In this post, I will discuss how I format my DataWeave code to improve its readability.

In my previous post about my Mule programming style, I discussed a couple things: first, why the readability of your code is so important, and second, how having a single flow that describes the overall intent of the code through descriptive doc:name attributes can really improve the readability of your code.

latest report
Learn why we are the Leaders in API management and iPaaS

In this post, I will discuss how I format my DataWeave code to achieve the same effects.

My code

I’d like to start with an example of what my DataWeave code typically looks like. Here’s a sample. Don’t worry about understanding what it does — we’re focusing on the organization of it.

%dw 2.0
output application/json

import applyToValues from modules::Utils

var invoices = payload

fun formatDate(s: String) =
  s as Date   {format: "MM/dd/yyyy"}
    as String {format: "yyyyMMdd"}

fun replaceNullEverywhere(e, replacement="") =
  applyToValues(e, ((v) -> if (v == null) replacement else v))
---
replaceNullEverywhere(
  invoices map ((invoice) -> {
    invoiceId:     invoice.invoice_id,
    invoiceNumber: invoice.invoice_number,
    amount:        invoice.total as Number,
    invoiceDate:   formatDate(invoice.date),
    allocations:   vars.allocations[invoice.invoice_id] map ((allocation) -> {
      percentage: allocation.percentage,
      amount:     allocation.amount,
      datePaid:   formatDate(allocation.paid_date)
    })
  })
)
```

Now let’s talk about what I’m doing and why.

Reassign Payload and FlowVars when appropriate

When you’re looking at a script for the first time that someone else wrote, “payload” doesn’t carry much meaning other than just being the payload of the message. There’s nothing there that cues the reader to what the payload represents, unless metadata has been set up. Reassigning “payload” to be something more meaningful can go a long way toward helping someone understand the intention of your transformation. In the example above, it’s obvious we’re mapping invoice data. Perhaps that’s obvious from the fields as well, but that won’t always be the case.

Name your Lambda/Callback arguments instead of using the default values

You won’t find $, $$, or $$$ too often in my DataWeave code anymore unless I’m writing reusable library functions that are independent of the business domain. In my opinion, the saved keystrokes are just not worth the sacrifice in readability. If I do something like:

invoices map {
  id:        $$,
  invoiceId: $.invoice_id
  ...
}

I will surely know at the time of writing the code what $ and $$ represent. But will I remember in six months when I come back to this code? Or will I need to look it up in the documentation again? What about the junior dev who’s going to maintain this? What if I did this instead:

invoices map (invoice, index) -> {
  id:        index,
  invoiceId: invoice.invoice_id
  ...
}

Which one is easier to understand?

Separate functionality from mapping

I keep all of my custom functionality in the header of the script (everything above —). Functions can get pretty complex, and having them inlined into the body of a DataWeave script can obfuscate what the output should be. The only thing I keep in the body area are function calls or code that helps show the shape of the output (more on that in the next section). This allows me to separate how I want to format the output, from how I want to format my functions, and it helps keep both simple and clean.

Transformation code should match the shape of the output, when applicable

If you look at my example code, you’ll see that I indent once for the first “map” and do an additional indent for the second “map.” I do this because it mirrors how the output of the script will look:

[
  {
    "invoiceId": 1,
    "invoiceNumber: "J503KL",
    "amount": 3.5
    "invoiceDate": "20181212",
    "allocations: [
      {
        percentage: 100,
        amount: 100,
        datePaid: "20181222"
      }
    ]
  }
]

I like to indent with two spaces, but it doesn’t matter. Just be consistent across your project. Ideally, you’re consistent over all of the Mule projects throughout your company or client.

Justify Your Code

I’m assuming this will be the most polarizing style preference because:

1. I rarely see anyone voluntarily do this.

2. They assume it’s going to take a long time to format code this way.

It’s going to take a bigger example to justify why I take the time to do this. Would you rather read this:

invoice map (invoice, index) -> {
  "insertionCriteria": createInsertionCriteria(invoice),
  "Invoice Line Number": index + 1,
  "Business Unit": buName,
  "Source": upper source,
  "Invoice Number": invoice.INVOICE_NUMBER,
  "Invoice Amount": invoice.INVOICE_AMOUNT,
  "Invoice Date": formatDate(invoice.INVOICE_DATE),
  "Supplier Name": configs.supplierName,
  "Supplier Number": configs.supplierNumber,
  "Supplier Site": flowVars.configs.SUPPLIER_SITE_CODE,
  "Invoice Description": trim invoice.INV_DESCRIPTION,
  "Invoice Type": invoiceType(invoice.INVOICE_AMOUNT),
  "Payment Terms": configs.PAYMENT_TERM,
  "Line Type": configs.LINE_TYPE,
  "Amount": invoice.LINE_AMOUNT,
  "Line Description": invoice.LINE_DESC,
  "Distribution Combination": buildDistributionCombination(invoice),
  "Terms Date": null,
  "Goods Received Date": null,
  "Invoice Received Date": null,
  "Accounting Date": null,
  "Payment Method": null,
  "Pay Group": null,
  "Pay Alone": null,
  "Discountable Amount": null,
  "Prepayment Number": null,
  "Prepayment Line Number": null,
  "Prepayment Application Amount": null,
  "Prepayment Accounting Date": null,
  "Invoice Includes Prepayment": null,
  "Conversion Rate Type": null,
  "Conversion Date": null,
  "Conversion Rate": null,
  "Liability Combination": null,
  "Document Category Code": null
}

Or this:

invoice map (invoice, index) -> {
  "Invoice Line Number":           index + 1,
  "Business Unit":                 buName,
  "Source":                        upper source,
  "Invoice Number":                invoice.INVOICE_NUMBER,
  "Invoice Amount":                invoice.INVOICE_AMOUNT,
  "Invoice Date":                  formatDate(invoice.INVOICE_DATE),
  "Supplier Name":                 configs.SUPPLIER_NAME,
  "Supplier Number":               configs.SUPPLIER_NUMBER,
  "Supplier Site":                 configs.SUPPLIER_SITE_CODE,
  "Invoice Description":           trim invoice.INV_DESCRIPTION,
  "Invoice Type":                  invoiceType(invoice.INVOICE_AMOUNT),
  "Payment Terms":                 configs.PAYMENT_TERM,
  "Line Type":                     configs.LINE_TYPE,
  "Amount":                        invoice.LINE_AMOUNT,
  "Line Description":              invoice.LINE_DESC,
  "Distribution Combination":      buildDistributionCombination(invoice),
  "Terms Date":                    null,
  "Goods Received Date":           null,
  "Invoice Received Date":         null,
  "Accounting Date":               null,
  "Payment Method":                null,
  "Pay Group":                     null,
  "Pay Alone":                     null,
  "Discountable Amount":           null,
  "Prepayment Number":             null,
  "Prepayment Line Number":        null,
  "Prepayment Application Amount": null,
  "Prepayment Accounting Date":    null,
  "Invoice Includes Prepayment":   null,
  "Conversion Rate Type":          null,
  "Conversion Date":               null,
  "Conversion Rate":               null,
  "Liability Combination":         null,
  "Document Category Code":        null
}

This is a no-brainer for me. One looks like a giant blob of unstructured code, the other looks like a clear set of key-value mappings. You may be thinking, “Wow, who has time to structure their code like this?” Frankly, I did this for years before taking an afternoon to write a script that does it for me: that’s how important I think it is, and how dumb I was for not writing the script earlier. It’s not perfect, but it’s good enough that it saves me a ton of time. You can find it on my GitHub here.

Front-end developers and UX experts are always talking about making a user interface intuitive and clean so that users can easily identify how to accomplish a particular task. Your code is a user interface for another developer. It should be intuitive and clean so that they can understand your intentions. Use care and consideration when deciding how to lay out your code.

Wrap generic functions in a simple interface, even if you don’t have to

This one is a bit more complicated. You may have noticed the following in the header of my first example:

import applyToValues from modules::Utils

...

fun replaceNullEverywhere(e, replacement="") =
  applyToValues(e, ((v) -> if (v == null) replacement else v))
---
replaceNullEverywhere(...)

I could’ve done this instead:

...
---
applyToValues(
  invoices map ((invoice) -> {
    ...
  },
  ((v) -> if (v == null) "" else v)
)

But I didn’t. Why?

“applyToValues” is a very generic function. As its first parameter, it can take a single value. It could be a string, an object, an array, or any combination of those, and more. Its second parameter is a function with a single argument. This allows the user to pass in any kind of functionality they want to apply to the first argument. So we now know two things about “applyToValues”: its first parameter can be just about any value, and its second parameter can be just about any single-argument function.

Wow. That’s a lot of flexibility. The downside to all this flexibility is that the reader needs to understand a lot of context to understand how the function is being used. They need to know what the value passes as the first argument is, and they need to understand how the function passed as the second argument is going to interact with that value to change it. That’s asking a lot of someone reading your code for the first time!

Is there an easy way we can alleviate that mental burden? Yes, and it takes just a few seconds: Give it a more context-specific interface; wrap it with a function that’s easier to use. By giving “applyToValues” a wrapper called “replaceNullEverywhere,” it’s immediately obvious to the reader how “applyToValues” is being used — it’s being used to replace all null values with another value. We also eliminate the need to expose the lambda to the caller, eliminating all the flexibility. At the expense of flexibility, we gain a ton of readability. You already went through the mental burden of figuring out that you’re going to use “applyToValues” to replace all the nulls with an empty String, save developers reading your code from having to figure that out all over again.

In software development, there’s typically a tradeoff between readability and flexibility. The more flexible a tool is, the more time you need to spend learning how to use it properly. This is true in a lot of other areas of life, including photography. Think about how easy it is to use your phone camera versus a DSLR. The interface of a camera phone is simple. A DSLR can take fantastic photos in a much wider variety of conditions, but the interface is much more complicated. You can’t just flip a switch on a DSLR and make it easier to use, but it is almost that easy with programming. Take advantage and do a favor for those who will need to read your code in the future!

Conclusion

That’s how I handle formatting and writing my DataWeave code, in a nutshell. Little steps like assigning your payload to a descriptive variable name, naming your callback arguments, separating your functions from your mapping, justifying your code, and giving generic functions a more specific interface can go a long way in helping others understand your code. Don’t feel like you need to apply all these rules at once, or apply all of them at all. Obviously, these are my opinions, but if you can walk away with something that will help your clients and other developers, that’s great! Please let me know what you all think in the comments section below.

This blog originally appeared on Jerney.io.