At some point while developing a Mule application, it’s almost inevitable that you’ll need to manipulate XML.
In this article, I will teach you how to read XML using DataWeave scripts and how to modify its encoding, namespaces, fields, and attributes with the goal to generate a JSON document output.
Getting started
To get started, we need some sample XML data as input to the DataWeave transformation we will write.
Sample XML data
Throughout this article we will use the following XML data snippet:
<?xml version="1.0" encoding="ISO-8859-1"?>
<simpsons:family xmlns:simpsons="http://training.mulesoft.com/simspons">
<simpsons:member can_vote="true">
<simpsons:last-name>Simpson</simpsons:last-name>
<simpsons:first-name>Homer</simpsons:first-name>
<simpsons:age>39</simpsons:age>
</simpsons:member>
<simpsons:member can_vote="false">
<simpsons:last-name>Simpson</simpsons:last-name>
<simpsons:first-name>Lisa</simpsons:first-name>
<simpsons:age>8</simpsons:age>
</simpsons:member>
<simpsons:member can_vote="false">
<simpsons:last-name>Simpson</simpsons:last-name>
<simpsons:first-name>Bart</simpsons:first-name>
<simpsons:age>10</simpsons:age>
</simpsons:member>
<simpsons:member can_vote="true">
<simpsons:last-name>Simpson</simpsons:last-name>
<simpsons:first-name>Marge</simpsons:first-name>
<simpsons:age>36</simpsons:age>
</simpsons:member>
<simpsons:member can_vote="false">
<simpsons:last-name>Simpson</simpsons:last-name>
<simpsons:first-name>Maggie</simpsons:first-name>
<simpsons:age>1</simpsons:age>
</simpsons:member>
</simpsons:family>
Extracting XML metadata
Let’s start by extracting metadata from the XML sample: the encoding type and the namespace.
Extract the encoding type
To extract the encoding type from the XML, unfortunately there is no elegant solution. What is required is a DataWeave expression to convert the payload using raw parsing and then to search it with a regular expression.
DataWeave Expression:
%dw 2.0 output application/json --- { encoding: (((payload.^raw as String) scan /encoding="([A-z0-9-]+)"/)[0][1] ) }
DateWeave output:
{ "encoding": "ISO-8859-1" }
Extract the aNamespace
DataWeave fully supports XML namespaces. An XML namespace is a W3C recommended mechanism that avoids name conflicts by differentiating XML elements or attributes that may have identical names, but different definitions.
To extract the namespace from an XML element use the hash symbol “#.”
DataWeave Expression:
%dw 2.0 output application/json --- { encoding: (((payload.^raw as String) scan /encoding='([A-z0-9-]+)'/)[0][1] ), namingspace: payload.family.# }
DataWeave output:
{ "encoding": "ISO-8859-1", "namingspace": "http://training.mulesoft.com/simpsons" }
Transforming XML fields
When transforming XML to JSON you may run into a couple of difficulties:
- XML only allows one root element, while JSON allows multiple.
- There is no array concept in XML (only repeated elements), however the JSON data structure is based on arrays and objects.
To overcome these issues, DataWeave provides the following solutions:
- To transform XML repeated elements into a JSON array, use the asterisk “*” DataWeave selector to create it.
- As JSON could have multiple root elements, you don’t have to do anything specific to generate the output.
DataWeave expression:
%dw 2.0 output application/json --- payload.family.*member
DataWeave output:
[ { "last-name": "Simpson", "first-name": "Homer", "age": "39" }, { "last-name": "Simpson", "first-name": "Lisa", "age": "8" }, { "last-name": "Simpson", "first-name": "Bart", "age": "10" }, { "last-name": "Simpson", "first-name": "Marge", "age": "36" }, { "last-name": "Simpson", "first-name": "Maggie", "age": "1" } ]
To specify your output format, you need to iterate over the items in your Array. That can be achieved with the map function.
Be careful, some fields contain dash symbol “-,” this means you cannot access the fields with the dot notation. Use the array notation $[“field-name”].
DataWeave expression:
%dw 2.0 output application/json --- payload.family.*member map { lastName: $["last-name"], firstName: $["first-name"], age: $.age }
DataWeave output:
[ { "lastName": "Simpson", "firstName": "Homer", "age": "39" }, { "lastName": "Simpson", "firstName": "Lisa", "age": "8" }, { "lastName": "Simpson", "firstName": "Bart", "age": "10" }, { "lastName": "Simpson", "firstName": "Marge", "age": "36" }, { "lastName": "Simpson", "firstName": "Maggie", "age": "1" } ]
Transforming XML attributes
The XML specification allows attributes to be defined for elements but the JSON specification does not have a direct equivalent. DataWeave provides a mechanism that allows the reading of attributes from an XML element using the @ symbol followed by the key.
The DataWeave script below extracts attribute information from the XML sample and converts it into JSON.
DataWeave expression:
%dw 2.0 output application/json --- payload.family.*member map { lastName:$["last-name"], firstName:$["first-name"], age:$.age, canVote:$.@can_vote }
DataWeave output:
[ { "lastName": "Simpson", "firstName": "Homer", "age": "39", "canVote": "true" }, { "lastName": "Simpson", "firstName": "Lisa", "age": "8", "canVote": "false" }, { "lastName": "Simpson", "firstName": "Bart", "age": "10", "canVote": "false" }, { "lastName": "Simpson", "firstName": "Marge", "age": "36", "canVote": "true" }, { "lastName": "Simpson", "firstName": "Maggie", "age": "1", "canVote": "false" } ]
Conclusion
In the previous article, you saw how to generate XML from JSON, now you are able to generate JSON from XML.
If you want to learn more about DataWeave 2.0, you can register for the DataWeave course.