Scaling Workday Integrations With MuleSoft: A Guide to Parallel Pagination

Reading Time: 18 minutes

Integrating enterprise-grade systems like Workday is no small feat, especially when handling massive datasets. Whether you’re syncing thousands of employee records or building data pipelines across HR, finance, and planning systems, performance bottlenecks can slow even the most well-architected integration to a crawl.

One of the most common challenges when working with Workday APIs is pagination, the mechanism by which large datasets are divided into smaller, manageable chunks or pages. When your integration is fetching 100,000 or more records from Workday, this delay can quickly become a serious performance bottleneck.

But what if you could retrieve multiple pages in parallel, cutting total processing time from hours to minutes and significantly increasing throughput?

Parallel pagination with MuleSoft is a practical approach that lets you fetch multiple pages from Workday at the same time, significantly improving performance and reducing processing time. It transforms both performance and scalability. In this comprehensive guide, we’ll show you how to architect, build, and optimize your Workday integrations at scale using MuleSoft’s advanced capabilities.

Understanding Workday integration patterns

Before diving into solutions, it’s crucial to know the primary integration patterns Workday offers:

Pattern	Use case and features
SOAP APIs	Most mature; ideal for large-scale data syncs (e.g. Get_Workers). Supports robust page-based pagination
WQL	Analytics-focused, dynamic filtering. Good for ad-hoc reporting, less ideal for large/nested datasets
RaaS	Exposes custom reports as REST/SOAP endpoints. Easier to consume, but with row limits and less filtering
REST APIs	Used in Workday Extend/mobile. JSON payloads, offset/limit pagination, best for small/transactional ops

Choosing the right approach for scale

While each Workday integration mechanism serves its purpose, SOAP APIs remain the backbone for enterprise-scale, system-to-system integrations especially when the use case involves large datasets, detailed employee hierarchies, or deep data relationships.

Organizations looking to sync tens of thousands of worker records, update job profiles, or replicate full Workday datasets into downstream systems like Snowflake, SAP, or Salesforce typically rely on SOAP APIs for their robustness and reliability.

However, this reliability comes at a cost. That’s where many developers hit a wall. As your data volumes grow and business processes demand faster sync times, the limitations of SOAP based pagination become increasingly apparent.

The hidden challenge of Workday SOAP APIs

Workday’s SOAP API paginates results for performance and reliability. Each call returns a chunk of records, and you’re expected to call subsequent pages to get the rest. However, this approach has its drawbacks:

Large enterprises have hundreds of thousands of employees. This introduces significant delays when fetching data sequentially.
Each page can take two to five seconds depending on payload size and latency.
Fetching each page one by one introduces delays and increases the risk of timeouts.
If a failure occurs mid-pull, you often need to rerun the full batch.

This sequential approach can quickly lead to slow performance and timeouts. To solve these issues, parallel pagination becomes a game changer.

Solving the performance problem: Parallel pagination with MuleSoft

MuleSoft provides a solution for this issue by allowing you to orchestrate multiple page requests in parallel, which drastically reduces processing time. For illustration, let’s consider the common Get_Workers call to the Workday SOAP API, which is frequently used to retrieve employee data at scale. Here’s how you can make the most of parallel pagination with MuleSoft:

Step 1: Prepare and send the initial Workday SOAP request

Set up and send the initial Get_Workers request to Workday. This step also includes retry logic to handle transient failures.

<!-- Step 1a: Prepare the SOAP request payload using DataWeave -->
<ee:transform doc:name="Prepare Request">
  <ee:message>
    <!-- Loads the DataWeave script for the request body -->
    <ee:set-payload resource="dwl/get-workers-request.dwl"/>
  </ee:message>
</ee:transform>


<!-- Step 1b: Call Workday with retry logic-->
<try>
  <until-successful maxRetries="3" millisBetweenRetries="1000">
    <workday:human-resources operation="Get_Workers" doc:name="get-updated-employees" config-ref="Workday_Config">
      <workday:headers><![CDATA[#[vars.workdayHeaders]]]></workday:headers>
    </workday:human-resources>
  </until-successful>
  <error-handler>
    <on-error-propagate logException="true">
      <logger level="ERROR" message="Get_Workers call failed. Headers: #[attributes.transportHeaders]"/>
    </on-error-propagate>
  </error-handler>
</try>

The first snippet prepares the request body using a DataWeave script. The second snippet sends the request to Workday, retrying up to three times if it fails, and logs any errors.

Step 2: Extract pagination information

After receiving the initial response, extract key pagination details (like total pages and total results) to determine how many pages need to be fetched.

<!-- Step 2: Extract pagination metadata from the Workday response -->
<ee:transform doc:name="Extract Pagination Info">
  <ee:variables>
    <ee:set-variable variableName="paginationContext"><![CDATA[
      %dw 2.0
      import mergeWith from dw::core::Objects
      output application/java
      ns wd urn:com.workday/bsvc
      var batchSize = 200 // Number of records per page
      ---
      (vars.paginationContext default {}) mergeWith {
   // Total number of pages in the response
        totalPages: (payload.wd#Get_Workers_Response.wd#Response_Results.wd#Total_Pages as Number) default 1,   
  // Total number of results      
  totalResults: (payload.wd#Get_Workers_Response.wd#Response_Results.wd#Total_Results as Number) default 0,
   // How many batches (pages) based on batch size
        totalBatches: ceil(((payload.wd#Get_Workers_Response.wd#Response_Results.wd#Total_Results as Number) default 0) / batchSize),
 	   // Number of records fetched in this payload
        fetchedRecordCount: sizeOf(payload) default 0,
	   // Is this the last page?
        isLastPage: ((payload.wd#Get_Workers_Response.wd#Response_Results.wd#Total_Pages as Number) default 1) <= 1,
        pageCounter: 1
      }
    ]]></ee:set-variable>
  </ee:variables>
</ee:transform>

This DataWeave script extracts pagination details from the initial response so you know how many pages to fetch in total.

Step 3: Generate page list and trigger parallel fetches

If there are multiple pages, generate a list of page numbers and publish each page fetch request to Anypoint MQ for parallel processing.

Store Failed Pages:

<!-- Step 3: Publish each additional page to MQ for parallel processing -->
<choice doc:name="Check for Additional Pages">
  <when expression="#[(vars.paginationContext.totalPages default 1) > 1]">
    <foreach doc:name="For Each Page"
             collection="#[((vars.paginationContext.pageCounter default 1) + 1) to vars.paginationContext.totalPages]">
      <logger level="INFO" message="Publishing page #[payload] to MQ for parallel processing"/>
      <anypoint-mq:publish config-ref="Anypoint_MQ_Config"
                           destination="workday-worker-page-queue"
                           doc:name="Publish Page to MQ">
        <anypoint-mq:body><![CDATA[
          %dw 2.0
          import * from dw::core::Objects
          output application/json
          ---
          {
            transactionProperties: vars.paginationContext mergeWith {
              pageCounter: payload,
              isLastPage: (vars.paginationContext.totalPages) == payload
            }
          }
        ]]></anypoint-mq:body>
      </anypoint-mq:publish>
    </foreach>
  </when>
  <otherwise>
    <logger level="INFO" message="Only one page to process; skipping parallelization."/>
  </otherwise>
</choice>

For each page after the first, a message is published to MQ for parallel processing
Each message contains the page number and a flag indicating if it’s the last page

Step 4: Store failed pages for retry

<!-- Step 4: On error, store failed page details in Object Store -->
<on-error-propagate enableNotifications="true" logException="true" doc:name="On Error Propagate">
  <set-variable variableName="endpointName" value="'worker-page-queue'" doc:name="Set Endpoint Name"/>
  <logger level="ERROR" message="Failed to process page #[vars.transactionDetails.pageCounter] for transaction #[vars.transactionDetails.'x-transaction-id']. Error: #[error.description]"/>
  <os:store doc:name="Store Failed Page Metadata"
            objectStore="error_page_object_store"
            key="#[vars.transactionDetails.'x-transaction-id' ++ '_' ++ vars.transactionDetails.pageCounter]">
    <os:value><![CDATA[
      %dw 2.0
      output application/json
      ---
      {
        'x-transaction-id': vars.transactionDetails.'x-transaction-id',
        pageCounter: vars.transactionDetails.pageCounter,
        startTime: vars.transactionDetails.lastRunTime,
        endTime: vars.transactionDetails.currentServerTime,
        errorMessage: error.description,
        errorType: error.errorType
      }
    ]]></os:value>
  </os:store>
</on-error-propagate>

This ensures failed pages are tracked for later retry rather than being lost.

Step 5: Retry failed pages in parallel

Periodically, retrieve all failed pages from the Object Store and retry them in parallel. On success, remove them from the Object Store.

<!-- Step 5a: Retrieve all failed pages from Object Store -->
<os:retrieve-all doc:name="Retrieve all previous error" objectStore="error_page_object_store"/>

<!-- Step 5b: Retry failed pages in parallel -->
<choice doc:name="Retry Failed Pages">
  <when expression="#[payload != null and sizeOf(payload) > 0]">
    <ee:transform doc:name="Prepare Failed Page List">
      <ee:message>
        <ee:set-payload><![CDATA[
          %dw 2.0
          output application/json
          ---
          (valuesOf(payload) map (read($,"application/json")))
            distinctBy $.pageCounter
        ]]></ee:set-payload>
      </ee:message>
    </ee:transform>
    <parallel-foreach maxConcurrency="3" doc:name="Retry Failed Pages in Parallel">
      <foreach>
        <ee:transform doc:name="Set Transaction Details">
          <ee:variables>
            <ee:set-variable variableName="transactionDetails"><![CDATA[
              %dw 2.0
              import mergeWith from dw::core::Objects
              output application/java
              ---
              (vars.transactionDetails default {}) mergeWith {
                'x-transaction-id': payload.'x-transaction-id',
                pageCounter: payload.pageCounter,
                lastRunTime: payload.startTime,
                currentServerTime: payload.endTime,
                pageSize: payload.pageSize
              }
            ]]></ee:set-variable>
          </ee:variables>
        </ee:transform>
        <flow-ref name="fetch-workday-data"/>
        <logger level="INFO" message="Successfully retried page #[vars.transactionDetails.pageCounter] for transaction #[vars.transactionDetails.'x-transaction-id']"/>
        <os:remove doc:name="Remove Successfully Retried Page"
                   objectStore="error_page_object_store"
                   key="#[vars.transactionDetails.'x-transaction-id' ++ '_' ++ vars.transactionDetails.pageCounter]"/>
      </foreach>
    </parallel-foreach>
  </when>
  <otherwise>
    <logger level="INFO" message="No failed pages found for retry."/>
  </otherwise>
</choice>

This finds all failed pages, deduplicates them, and retries each in parallel. On success, the page is removed from the Object Store.

Scaling with confidence: CloudHub 2.0 considerations

As you scale, it’s important to design your integration with vertical and horizontal scaling in mind to handle large volumes of data without sacrificing performance.

Use 1 or 2 vCores for integrations involving parallel SOAP calls and data transformations. More vCores help handle larger payloads.
Enable DataWeave streaming to optimize memory usage when handling large XML payloads.
Use Anypoint MQ to decouple the producer (pagination flow) from consumers, allowing independent scaling.
Deploy multiple replicas of the consumer application to process messages in parallel.

Workday SOAP response times: What to expect

Workday APIs can vary in response times depending on several factors such as data complexity, the size of the request, and the system load. Here’s what you can expect:

1–5 seconds: Normal response time for small to medium-sized requests
5–10 seconds: Occurs under heavier system load or with more complex queries (e.g. nested data)

Factors affecting response times:

Larger datasets take longer to process
Deeply nested or custom fields slow down response times
Response times may increase during high-traffic periods
Complex filters or calculations increase processing time

Best practices for optimizing Workday API calls

To enhance the performance of your Workday API integration, follow these best practices:

Keep page sizes between 200–500 records: Avoid large page sizes that can slow down response times. A range of 200-500 records ensures efficiency and minimizes the risk of timeouts.
Use filters like lastUpdateFrom and lastUpdateTo to limit the data returned: This reduces the dataset size and improves performance.
Avoid unnecessary calculated fields: Calculated fields can slow down response times. If possible, perform calculations after retrieving the data rather than in the Workday request.
Regularly check Web Services Monitoring logs to identify performance bottlenecks in your queries and address them proactively.
Ensure your Workday Integration System User (ISU) has only the necessary permissions: Over-permissioned ISUs can cause delays due to excessive security checks.
Keep your MuleSoft Workday Connector version in sync with the Workday API version your tenant uses: Avoid hardcoding API versions and use variables for easier updates.
Use Workday’s Response_Filter and Exclude_Data to limit the data returned: This reduces response size and improving efficiency.

Achieving scalable, reliable Workday integrations

Leveraging parallel pagination in MuleSoft allows you to significantly reduce data retrieval times, ensuring that your Workday integrations are not only faster but also more resilient. By using message queuing, fault tolerance, and scalable architecture, you can handle large datasets efficiently.

MuleSoft’s flexibility enables you to design and optimize Workday integrations that scale seamlessly, improving throughput, reliability, and overall system performance. With these techniques in place, your Workday integration will be ready to handle the challenges of large-scale data processing while maintaining the integrity and speed required by modern businesses.

Editor’s note: This article was collaboratively written by Shoban Kandala and Anil Kumar Aruru.

Scaling Workday Integrations With MuleSoft: A Guide to Parallel Pagination

Share post

Understanding Workday integration patterns

Choosing the right approach for scale

The hidden challenge of Workday SOAP APIs

Solving the performance problem: Parallel pagination with MuleSoft

Step 1: Prepare and send the initial Workday SOAP request

Step 2: Extract pagination information

Step 3: Generate page list and trigger parallel fetches

Step 4: Store failed pages for retry

Step 5: Retry failed pages in parallel

Scaling with confidence: CloudHub 2.0 considerations

Workday SOAP response times: What to expect

Best practices for optimizing Workday API calls

Achieving scalable, reliable Workday integrations

Related articles

How to Build an AI-Powered MuleSoft Scheduler Dashboard

MuleSoft Direct: The Secret to Unlocking Your Enterprise Data

MuleSoft Data Integration at Scale: Tuple Scripting Meets Parallel Pagination

Newsletter

You have been redirected