Mr. Batch and the Quest for the right Threading Profile

Reading Time: 9 minutes

Sometimes (more often than we think), less concurrency is actually more. Not too long ago, I found myself in a conversation in which we were discussing non-blocking architectures, tuning, and performance. We were discussing that tuning for those models often starts with “2 threads per core” (2TPC). The discussion made me curious about how Mule’s batch module would perform if tested by 2TPC. I knew beforehand that 2TPC wouldn’t be so impressive on batch, mainly because it doesn’t use a non-blocking threading model. However, I found myself thinking that the 16 thread default threading profile might be a little excessive (again, because sometimes less is more) and wanted to see what would happen if we tried it. You know, just out of curiosity.

Designing the Test

I explained my hypothesis to the mighty Luciano Gandini from our performance team. He pointed out that there were two different cases that needed to be tested:

  • Jobs that are mainly IO bound, which spend most of their execution time performing IO operations (to disk, Databases, external APIs, etc)
  • Jobs that are mainly CPU bound. They might as well do a fair amount of IO, but most of the processing time is spent on CPU operations (transformations, validations, etc).

GOTCHA: Because batch relies on persistent queues to support large datasets while guaranteeing reliability and resilience, no batch job is truly light on IO. However, because the impact of that fact is similar no matter how many threads you use, for the purpose of this test (and for that purpose only) we can pretend that factor doesn’t exist.

Luciano then designed two batch jobs, one IO intensive and one CPU intensive one. Each job was executed several times with a 1 million records dataset. The tests were executed in a 24-core computer (the performance guys have pretty amazing toys!) and a different threading profile was used on each run. These were the results:

IO bound Jobs

chart_1

The two first runs used 8 and 16 threads, the later being much faster. That’s easy to explain – since the job is hard on IO, many threads will find themselves locked and waiting for the IO operation to finish. By adding more threads, you can have more work going on. Then, there was a third run which used 24 threads (1 per core). Again, this was faster but not by much. And again, this didn’t come as a surprise. Although there’s more work being done, the new threads also block on IO operations at basically the same time and by the same time, while adding an increasing penalty on context switch and thread administration penalty. The last run used 48 threads (true 2TPC) and while still faster, the improvement gained was not significant compared to the extra CPU and memory cost.

Conclusion: More threads do increase performance, but only to a certain extent, which you’ll find pretty fast. The 16 threads default was validated.

GOTCHA: If your job’s IO includes consuming an external API, adding more threads might turn out to be way more harmful than shown here. That’s because some APIs have limits in terms of how many calls can you perform a day or even how many you can perform concurrently. Exceeding those thresholds might result in your requests being rejected or throttled.

CPU bound jobs

chart_8

These results did come as a surprise. First of all, because the behavior was really similar for that of IO bound jobs, so the first hypothesis of the two cases being different was the first thing to be disproved by the test.

For the first two runs with 8 and 16 threads, the results were similar. 16 threads did the job in less than half the time. The big difference, however, was that the 24 and 48 runs gave almost the same running time. This is because the threads on this job didn’t spend any time at all being blocked by IO operations, so the overhead of the added threads pretty much consumed all the gained processing power. Gotta admit, I didn’t see that one coming.

Conclusion: The behavior is pretty similar no matter the nature of the job. Although for CPU intensive ones, the decay is more noticeable once the 16 threads barrier is surpassed. The 16 threads default was validated again.

Final thoughts

The good news is that it looks like we don’t need to change any defaults in batch. But most importantly, the results provided some lessons on tuning and performance which we weren’t expecting. Keep in mind however that no two jobs are born alike. You might as well have cases which don’t follow these trends. The purpose of this post is not to tell you what to do, but to give you ideas on how to test what you’re doing. Hope it helps on that regard.

We’d really like for you to share what kind of batch jobs you have and how they react to variations in the threading profile. Looking forward to validating these trends or getting more surprises!

Cheers.

Content created by developers, for developers

Reading Time: 5 minutes

CONNECT 2015 Technical Sessions

With over 30 sessions, customer case studies, hands on labs, product demonstrations, complimentary certification, and a packed solution expo of MuleSoft certified partners, CONNECT provides an great opportunity for current users of Anypoint Platform to uplevel their skills. And if you’re deciding if MuleSoft’s solutions are a good fit for you and your organization, we’ve designed a 3 day experience that will equip you with the resources to make the right decision.

Here are seven CONNECT 2015 sessions created by developers, for developers:

Introduction to Anypoint Platform

MuleSoft experts provide an overview of the the broad set of capabilities available out of the box with Anypoint Platform that drive developer productivity. Infused throughout the demo: best practices for configuring flows, applications, and servers to quickly deliver business agility across a broad set of API-led connectivity use cases.

Anypoint Platform for APIs Deep Dive

API Design, Development and Analytics continue to be key areas where MuleSoft is innovating. Join MuleSoft’s API Product Management leadership as they demo recent product features and share our forward-looking roadmap.

How to Build Your Own Connector in 30 Minutes

Learn how to use MuleSoft’s SDK, DevKit, to build your own connector. This demo will take you through all the steps of connector development and showcase new and updated features you can leverage today to quickly and easily build your own reusable connector.

Integrating Salesforce to Unlock Systems of Record

See how you can use Anypoint Platform to connect Salesforce to other applications like SAP. This demo-driven session will show you different ways to integrate Salesforce, including loading SAP customer master data into Salesforce, synchronizing real time updates, creating shared services and developing an agility layer with APIs. Come see Anypoint Platform and Anypoint Data Gateway in action.

APIs in a .NET World

Using RAML tools for .NET? Learn how to build, publish and consume APIs using Visual Studio and the Anypoint Platform.

Delivering Performant, Reliable, and Scalable Apps with Anypoint Platform

Supporting everything from mobile apps with thousands of concurrent users to global deployments processing millions of requests daily, Anypoint Platform has been put to test. In this session, MuleSoft experts will talk through case studies from our most demanding deployments and provide a best practice approach to designing and delivering application to meet the non-functional requirements (e.g. SLAs) of your application.

What’s new with Anypoint Platform?

Anypoint Platform continues to evolve in ways that better equip developers, architects, and operations managers to deliver API-led connectivity. In this session, we’ll showcase new and upcoming functionality through a set of illustrative demos.

Don’t miss the opportunity to engage in an interactive discussion with your peers around connectivity and learn why an API-led connectivity approach can help your teams move faster while maintaining visibility and control.

register-now
Don’t forget – blog readers can get 20% off their CONNECT ticket with the promo code BLOG20mulesoft-connect-2015

Salesforce Integration Patterns

Reading Time: 17 minutes

When it comes to getting data in and out of Salesforce using Anypoint Platform, there are a number of different options. Typically, using just one of them won’t give you everything you need and you might have to combine a number of them to to achieve a complete solution. In this post, we’ll summarize 4 options – Realtime, Custom, Bulk, and Data Virtualization – when to use which, and things to keep in mind for each.

sfdc-diagram+2

Real Time

In terms of real time communication with Salesforce, it’s important to keep in mind that there are huge differences between how to handle inbound and outbound communication. Inbound communication is covered completely by the core API, which means that it’s also completely covered by MuleSoft’s Anypoint Connector for Salesforce. Salesforce limits the number of API calls you can execute per day according to this table. Due to this polling, it’s not a great option for getting data out of Salesforce. This is addressed by offering outbound messaging and the streaming api.

Inbound

Salesforce Core API

The core API is the normal Salesforce API that allows you to perform CRUD type operations. The Anypoint connector fully supports the Salesforce Core API.

  • This is useful when…
    you are interacting with Salesforce as part of an orchestration or automation process and you need to perform CRUD type operations against Salesforce. For example, if you have a case in Salesforce that has been resolved in another system such as Jira and the case needs to be closed in Salesforce.
  • This might not be the best option when…
    you need to update a large amount of objects in one go. The core API only supports updates to 200 objects at a time. Also, if it doesn’t make sense to maintain duplicate data in Salesforce at all, a better option is to consume external services using Apex or using Lightning Connect.

Outbound

Outbound messaging

Salesforce outbound messaging is being configured as part of a workflow defined in Salesforce and it’s a callout to an external SOAP service defined by Salesforce. When configured in Salesforce, a WSDL will be generated that is available for download. It’s then up to you to implement and expose a SOAP service that Salesforce can make the callout to. From a MuleSoft perspective, the way to handle this is to implement that service using the CXF module or as XML over HTTP.

  • This is useful when….
    there is an transactional event Salesforce that needs to be processed, such as an opportunity being updated to “closed won”. The main benefit with outbound messages vs other options is that it has retries built in; so if for some reason the receiving application is unavailable, Salesforce will retry and report as an error if the defined retries have been exhausted.
  • This might not be the best option when…
    you don’t have the ability to implement SOAP services. MuleSoft offers complete support for exposing SOAP services. However if the event triggering the callout is more informational and does not require an acknowledgement or retries, then the streaming API might be easier to implement.

Salesforce Streaming API

The way that you use the streaming API from a Salesforce admin perspective is that you create an instance of the entity PushTopic. In that PushTopic, you define a condition for when an event should be published to that topic. For example, when an opportunity goes to state “closed won”, an event will be published to a specific PushTopic, then you can have zero to many subscriptions to that push topic that will get notified directly once it happens.

  • This is useful when….
    you want to be notified asynchronously when a condition in Salesforce is fulfilled and there is no need for retries. From an Anypoint Studio development perspective, this is also a bit easier to setup compared to outbound messaging.
  • This might not be the best option when….
    you need to make sure that the message is not being lost even if the recipient is not available. Since this is a publish subscribe exchange pattern, the message will be lost if your client/subscriber is unavailable once the event triggers.

Custom

Salesforce offers a great set of features when it comes to customization – some of which are very useful for integration. This option does however require you to be familiar with Apex and how to do custom extensions of Salesforce.

Inbound

Apex REST Service

You can expose REST services from Salesforce using custom Apex development and by default these services will be secured using the normal session token mechanism used by the core API. You can utilize the parts of the Anypoint connector that deals with the login and session token management, and then just use normal Mule functionality to call the service using JSON over HTTP. Currently, the login method is not exposed in the connector but you can get around this by programmatically calling the method (example here). The next release of the Salesforce connector will have full support for this so that no custom code will be needed.

  • This is useful when….
    you already have developed Apex services in Salesforce and you want to utilize the Anypoint connector to consume these services, or if you have very complex operations that needs to be executed inside the Salesforce platform due to performance reasons.
  • This might not be the best option when….
    you are building services from scratch. A better option is to build and expose your APIs on a platform built for just that, like Anypoint Platform for APIs. This way, you have an abstraction layer for your API that is decoupled from the endpoint and being able to use the Salesforce connector to implement your API. Salesforce does not offer any capabilities around API Management, API Portals or API design.

Outbound

Consume REST Service using Apex

In the same way that you can expose a REST service using custom Apex code, you can consume one as well. You can tie that functionality to a custom button or any other Salesforce component. One example of this would be to add an available to promise check on the opportunity product object. In these cases, the Anypoint connector will not help you; however, Anypoint Platform for APIs would be central.

  • This is useful when….
    you want to display data that is not in Salesforce and does’tt make sense to maintain there and this needs to happen synchronously.
  • This might not be the best option when….
    the data needs to be synchronized/duplicated in Salesforce rather than making a real time request.

Batch/Bulk

As the name suggest this is about moving large amounts of data in and out of Salesforce. This is typically done during the initial stages of a project where an existing dataset needs to be loaded into Salesforce.

Inbound

Bulk api

Using the bulk api which is fully supported using the MuleSoft Salesforce connector you can automate the upload of large number of records from a file, database or any other endpoint.

  • This is useful when….
    you have an export from a third party system and this data needs to be enriched or some more complex transformations need to be executed before it’s uploaded to Salesforce.
  • This might not be the best option when….
    the data transformations that you need to do are very straight forward and there are no need for automation.

dataloader.io

dataloader.io is an online tool built for loading data into Salesforce. It’s a web application but at its core it’s powered by the MuleSoft runtime and connector. Currently it’s the most popular application on the Salesforce App Exchange.

  • This is useful when….
    you need to import csv (excel) formatted data into Salesforce. Dataloader provides you with a graphical UI that allows you to do simpler transformations.

This might not be the best option when….
you need to combine or enrich data from multiple sources before you upload it or if you need to do complex transformation or data aggregation.

Outbound

Salesforce Bulk api

Using the bulk api which is fully supported using the MuleSoft Salesforce connector you can automate the export of large number of records from sfdc as a csv or xml.

  • This is useful when….
    there is a need to have an automated export from Salesforce where the data needs to be inserted/updated in one or many endpoints on a regular basis. For example if you want to export the accounts from Salesforce and insert that to an ERP and a data warehouse on a scheduled basis.
  • This might not be the best option when….
    when you need to do a one time data load from Salesforce to some other system that can accept a csv or a xml file as you get that functionality out of the box from dataloader.io.

dataloader.io

In addition to loading data into Salesforce dataloader.io can also be used to export data from Salesforce.

  • This is useful when….
    you need to do an export from Salesforce that does not require any complex data transformations and you can manually handle an export file.
  • This might not be the best option when….
    you want to have a fully automated process that includes multiple endpoints and complex transformation.

Data Virtualization

Inbound

Salesforce1 Lightning Connect

Using the Anypoint Data Gateway you can read data from external systems such as SAP and databases within Salesforce in the same way you would be able to with any native entity. Currently the Anypoint Data Gateway only supports read but create and update is on the roadmap.

  • This is useful when….
    you only need to display an external entity and you don’t want to do any configuration or development at all.
  • This might not be the best option when….
    you need to be able to update the data however this is being addressed in an upcoming release of Anypoint Data Gateway.

Thanks for taking the time to read. Leave a comment below and let us know what method you use the most!

Check out Salesforce integration solutions from MuleSoft »

API-led Connectivity and CQRS: The Challenge

Reading Time: 12 minutes

Part 1: The Challenge

Let’s imagine you’ve been working as an architect in a large company for several years and are very proud of the now mature Supplier Relationship Management (SRM) application you specified, formed, and delivered to the business, as it continues to provide value.

Functionally, the SRM is a web application that allows for managing relationships of suppliers and materials, maintaining hierarchical structures, uncovering unwanted dependencies and helping your clients significantly reduce supplier costs. Your department was chosen for this quarter’s highly visible and mission-critical innovation project: a new mobile application.

Luckily, one of your largest customers is building hydraulic excavators and is very interested in working with you to deliver this mobile application. Your customer wants to gain real-time visibility into the materials hierarchy for the field sales team. This will drive efficiency improvements as paper-backed catalogues can be eliminated and material price currency conversion and material data information can be made available in real time. On top of this, the field sales team will be able to better map supply chain dependencies and discover from their mobile devices the specific materials shipped by various suppliers, delivering a significant productivity boost against an otherwise manual and time-consuming process.

Thinking about the components that represent your solution you might come up with something like this:

solution-components

As you are an experienced architect, you’ve created a plan to implement these features within your application. Checking back with your company’s release management your next release is scheduled to take place in 8 months. The business, however, is demanding that the application be delivered in 2 months, and scaling the mobile application back is not an option.

In addition to these timeline pains, you also face the following challenges in delivering mobile functionality on your mature SRM:

  • The SRM was not designed nor built for mobile connectivity and scalability requirements
  • There are no general purpose feature hooks for data enrichment (e.g. Cloud or SAP) – besides probably a basic D&B or Equifax integration
  • There is no easy way to craft new APIs, have them mocked and parallelize mobile and SRM development
  • You are facing month long release cycles and the product team is already overcommitted to deliver promised features in the next version

The primary driver of these pains: the SRM wasn’t designed with these requirements in mind. Instead, it was built as a web application with very tailored business workflows. Looking at your current REST APIs the mobile app could partly be implemented, but it potentially wouldn’t scale as the application is backed by a relational database which would need to perform heavy join operations to craft the needed results.

Project requirements

So what are your requirements?

  • Mobile enablement of the existing solution
  • Fast running queries to retrieve supplier and material relationships without affecting the existing web app while it’s running
  • Add a real-time currency converter capability extending the limited DB schema of the app
  • The existing app can’t be changed for the mobile application
  • Maximum two month timeframe to ensure you and your manager reach the target (and get commission)

What do you need to deliver on these requirements?

First of all to enable mobile developers, you will need an API. Even more, to enable the guys to start immediately the API should be a mock up and easy to adjust, so you can react quickly to feedback and changing requirements

Secondly, there needs to be a way to easily retrieve the hierarchical structured information and that solution has to be extendable.

Thirdly, a manageable and simple way to develop and deploy business functionality such as queries, commands and synchronisation tasks as single runtimes.

CQRS

Let’s have a look how you can use the Command-Query Responsibility Segregation (CQRS) pattern and an agile approach to implement your solution.

Never heard of CQRS? CQRS suggests you:

  • Solve read heavy requirements
  • Scale reads and writes independently
  • A wire-friendly view model in contrast to your complex domain model
  • Don’t assume queried data is 100% up-to-date
  • Clients send Commands to the server, not state changing facts
  • Commands can be queued and processed asynchronously
  • Use CQRS only on specific portions of your systems
  • Think in domains (also see Bounded Context)

Consider checking out the following on CQRS for additional context and insight:

Let’s look at how we can use this approach to solve your requirements.

Query like this…

Supplier and material relationships fit cleanly into a graph database as all relationships are easily manageable. The graph data model you keep in the graph database makes it very easy to design new business features incorporating graph relationships and map that to your mobile apps view model.

So, let’s make Neo4j, the world’s leading graph database, responsible for delivering the high performance results on digging through your hierarchical data and therefore divide query and write responsibilities.

…and Command like that.

Write via the existing Web API of the SRM app or write directly to the database. For the sake of speed and missing Web API functionality, let’s focus on the DB only first.

Sounds good? Yes! But we also have a challenge. How do you get Neo4J and SRM synced so we can actually query current data from Neo4j?

The answer:  MuleSoft’s Anypoint Platform

MuleSoft’s Anypoint Platform comes equipped with ETL-like capabilities that will keep SRM and Neo4j in sync by running a background synchronization. For this synchronization task you could opt for Availability and Partition Tolerance (AP) as per CAP theorem instead of Consistency and Partition Tolerance (CP) between Neo4j and SRM.

This would mean there might be points in time where the data queried from Neo4j is not consistent with the data in the SRM app. An ongoing write operation in SRM might not be visible with a Neo4j query. For most use cases and as an assumption for CQRS this is an acceptable requirement as the out-of-sync time period is usually negligible and the data still fulfills business requirements.

Achieving 100% Consistency and Partition Tolerance, or CP, should be feasible but the significantly higher costs of doing it have to be justified with a realistic view of the requirements.

Logical Solution Design

logical-solution-design


The figure highlights three areas to attack:

  • data synchronisation
  • data query
  • data commands.

All of these can be implemented by following the Separation of Concerns (SoC) principle, meaning you use a separate and isolated runtime for every clearly defined business/integration function: namely Data Synchronisation, Data Query and Data Commands.

NOTE: To make this happen, touching the mature SRM application is not required, enabling us to move forward with new projects in the future.

Now that you have a plan the next step is to build the project leveraging the full power of  Anypoint Platform. Our upcoming posts in this series will cover these implementation details. Specifically, the next post will cover how to quickly enable mobile developers to begin building a prototype application with your Neo4j NoSQL database with API-led connectivity.

You best get started with our brand new whitepaper on API-led Connectivity!