Sravan Kumar Vazrapu, a Lead Member of Technical Staff at Salesforce, shared how to use Slack and PagerDuty to monitor CloudHub apps in real time during an Online Group-English Meetup. This blog reviews the use case discussed during the event and shows how to implement it.
Operation teams spend almost 80% of their time monitoring APIs manually using email alerts. Many CloudHub implementations use email alerts as the main mechanism to notify support teams when an application or API has failed. Ideally, these application failures are identified quickly so end users are not impacted. However, when email alerts are the primary solution for failure notifications, monitoring becomes reactive rather than proactive.
Another issue with email alerts is that there is no way to know if the issue has been fixed or if the resolution process has already been started by another team member. This results in additional manual tracking tools, such as spreadsheets, to organize and manage each of these alerts. This workaround makes it difficult to integrate with CloudHub’s incident management system which handles the issues that are generated.
There is a gap between generation and resolution that can be fixed by enabling real-time monitoring when connecting CloudHub to PagerDuty and Slack.
Setting up CloudHub operations for alerts
Given that most of the applications from CloudHub can send email alerts, the best way to close the monitoring gap is to integrate the incident management system with PagerDuty. PagerDuty creates an incident automatically and can notify the operations team via Slack — enabling a proactive response to critical incidents and making it easy to identify which tickets need to be worked on.
Additionally, there can be dependencies on each integration. For example, an integration may be reliant on systems like Salesforce or Workday. A common use case from our team was password expiration; if a password expires, the operations team cannot address the issue quickly as they are reliant on system administrators like DBA or Salesforce admins to generate a new password.
In our case, Slack was configured to communicate and collaborate with several different people from a variety of teams to resolve the incident. With this integration, any issue that arose could be resolved much quicker.
This can be achieved by following three simple steps that will reduce the complexity of creating separate APIs or apps. Anytime we have to shift into other tools, all of the predefined configurations are already baked into these systems. Here’s how we did it:
Step 1: Configuring business services in PagerDuty
PagerDuty is an incident management system that has many out-of-the-box capabilities, such as creating incidents automatically and creating services with just a few clicks, rather than code. In the services tab in PagerDuty, create a new service by providing integration names (Example: Salesforce to DB Account Integration) and select integrate via email. Once we select PagerDuty as email, it will generate a unique email ID under the integration tab, which will be our integration key. The generated PagerDuty email will be added in CloudHub’s alert configuration.
Step 2: Configuring alerts in CloudHub
To demonstrate the use case, I used the Salesforce to DB templates in Anypoint Exchange and CloudHub Connector. CloudHub Connector allows any application to send notifications out of the box. In the error handling flows, either the CloudHub connector or an email connector can be used if there are too many alerts and rate limiting becomes a factor. Similarly, a system API that contains an email connector can be reused across integrations. This is dependent on each use case, application, or the organization’s approach.
Use the unique email ID from PagerDuty generated in step 1 to configure CloudHub alerts. All CloudHub alerts will be sent to the PagerDuty email and auto-create incidents in PagerDuty. Now CloudHub and PagerDuty are connected without writing any code.
Step 3: Configuring Slack with PagerDuty
To configure Slack as an extension from PagerDuty, add an extension/add-on and then add Slack. This will show setting options in Slack, such as if we want to notify, acknowledge, or assign changes and urgency right from there. Based on the configurations we choose for Slack in PagerDuty, it will notify the operations team via configured Slack channels. Once authorized, the Slack configuration is done. We can also add other extensions, such as notifying customers from other Slack channels or logging JIRA tickets.
Once these steps are done, the alert notifications will be available in Slack and are converted to incidents. Each error or alert will notify our teams through PagerDuty and Slack in real time. In Slack, we can acknowledge an alert, view the details, resolve, or take additional actions from the Slack interface. Since much of this can be done through Slack’s mobile app, we can achieve timely communication and collaboration across teams and can resolve incidents from anywhere.
This solution enables the team to perform operations tasks from anywhere, including on mobile phones. Overall, it allows our team to provide a better customer experience when solving issues that may impact them. It allows us to innovate, better serve our customers, and collaborate efficiently and effectively from anywhere.
Watch my entire presentation or get more information on configuring business services in PagerDuty. Join the Online Group English Meetup group for more presentations or see the full list of upcoming Meetup events.