This post was written by Ramsankar Ananthan and Nicholas Keune.
The need for operational notifications
One key rule in IT operations management includes knowing when (and why) something breaks and how to resolve those incidents. The best IT operations managers will completely control the incident management lifecycle – from real-time incident detection to future incident preparation. IT operations managers act as the essential connection between technical systems and the people who use them.
Detecting and understanding significant signal shifts is a fundamental need that cuts across all use cases for organizations that build, maintain, and evolve digital systems. In application performance monitoring, for example, you want to detect increases in latency in your application’s responsiveness as well as in error rates, because both potentially have a direct impact on your service and user experience. Furthermore, you want to see this at an infrastructure level with metric alerts such as CPU or memory usage spikes or even service and network downtime. Each of these could result in application performance degradation if not acted on in a timely manner. In addition to one-time events, there is also a need to detect recurring patterns within logs in an effort to understand and proactively avoid future situations.
Operational alerts are simple notifications that help companies optimize sales, marketing, and internal processes. Operational alerts can keep sales teams informed of new promotions, notify an anxious customer that their package has arrived, or let a shopper know that a product is now available. Many companies overlook and underestimate the value of operational alerts despite the fact that they are vital throughout. Maintaining compliance and high customer satisfaction requires a competent support team that can address intermittent issues.
There are many benefits to notifications from behaviors within an application. Some of the most common ones that we see with our customers include:
- Data changes: These alerts are not simply about status (e.g., the server is down), but are about streams of data. Usually, the goal is to figure out what is “normal” and what is “surprising.” So taken all together, “notify users about meaningful data changes” makes our goal clear: helping users make data-driven decisions.
- Service availability: These notifications help you monitor the availability status of all your infrastructure components so that you can detect and resolve problems before they affect users.
- Miscellaneous errors: Any alert which requires human intervention to remediate.
Slack for alerting
The current state of IT systems management across the enterprise is a combination of automated and manual processes. Companies are doing their best with this arrangement, but they recognize that they could be doing better. When events occur that can impact the business, many companies rely on manual procedures and mass notifications to identify and notify employees who can resolve the issues. Oftentimes, they may have to rely on multiple channels such as monitoring dashboards, emails, texts, and other types of notifications.
One of the MuleSoft customers chose to do this via Slack, alongside their SEIM (software engineering incident management). The goal here was for Slack to be the quick response tool for remediating these issues. It was ideal because:
- Operations teams view Slack throughout the day: Slack makes day-to-day communications more efficient, particularly when many people are working from home. The platform brings the team and its tools together in one place.
- Workflows can be triggered based on a Slack message: Slack brings together the right people, at the right time, in the right place to quickly move forward with the next course of action.
- Allowed for wide distribution: A slack notification allows you to inform all your team(s) simultaneously without any distractions.
- Looped in leadership without creating interruptions: Slack channels keep leadership and stakeholders informed during an incident without disrupting workflow. While stakeholders have visibility into the activity in the channel, they are able to be involved as much or as little is necessary.
One huge advantage of Slack compared to email is the lean and responsive environment for collaboration. Furthermore, Slack is hosted as a cloud service so there is no infrastructure necessary, which is why many startups rely on Slack for operational alerts.
Slack’s platform streamlines your tools and workflows into one place, lifting the mental load of jumping from app to app. When you’re able to quickly find, share, and act on information across your tools, you can quit juggling tabs and stay focused on more meaningful work.
Building app-level notifications for reuse
Channel and alert fatigue is a reality. Sometimes we simply have too much noise to focus on the signal, important as it may be. This fatigue gets exacerbated when we have fragmentation of the channels for alerting. If each application uses the developer’s favorite means for alerting, or different domains of applications come with a host of their own distinct channels, it makes this even worse. We can only search through so many channels and so much noise before we start missing the important stuff.
Critical to fighting this fatigue is reducing the number of distinct swivel-chair applications each person needs to focus on, and having a repeatable approach that helps highlight what needs attention. One historical approach to address this is to build the alerting framework in every application or API. But this can lead to so much building! Because every app has its own way of defining an error, and every developer has their own way of reporting errors, this produces the noisy mess we often find in operations. A much better approach is to define this pipeline and framework and apply it across the applications.
We’ve found that this works even when we’re dealing with multiple domains. Consider monitoring application transactions (performance health), tracking events like brute force login attempts (security), and raising errors in the integrations for remediation (design and implementation). This might be treated as 3 (or more) entirely separate channels. But regardless of the domain, the goal is to provide a common experience for meaningful alerts without placing a heavier burden on the developer or the recipient. As we’ve implemented this for our customers, we’ve found that it often doesn’t really matter the exact domain of the alerts. Rather, we can mix domains within a single framework and notification pipeline and gain the benefit of reducing the cognitive load on the recipient.
For a developer, this also reduces the custom development around errors. Developers can send the general application-specific structure to a reusable MuleSoft notification app. Within that Mule application, there is logic to triage and structure the alert in a meaningful way. This can mean reformatting information or even making subsequent calls to external systems to look up adjacent information. For instance, when there is a data-level error within a transaction, it might be helpful to look at some of the upstream fields and enrich that data. This logic and processing is written one time, and then later invoked in any error flow.
A single pipeline/framework for notifications entails:
- Error enrichment logic is written in one place.
- Logic for triage and distribution to other systems is written in one place.
- Auth tokens for the receiving channels are managed are in a single place.
The benefits to this approach are that it not only eliminates ambiguity in how to handle errors by creating a streamlined process for teams, but it makes it easier to upgrade, scale, manage, and operate.
Use MuleSoft to send Slack notifications
Some artists say that everything has already been created. Some agree, and some don’t. However, if developers are artists, then one thing is certain: In the current age, much of an app’s core functionality has probably already been developed. If your core business is a bespoke content platform, why would you invest your time building integration software? A better strategy is to use software that has already been battle-tested and is just a few clicks away from working.
New slack connector makes this even easier!
MuleSoft has a connector for Slack. The Slack connector provides two-way integration between the Slack messaging system and Mule apps. The Slack connector can be used as an inbound and outbound component, which means that a Mule flow can listen for Slack events and also perform several operations towards Slack.
Anypoint Connector for Slack (Slack Connector) gives you access to the Slack platform. This connector exposes all of the operations provided by the Slack API. The Slack Connector enables organizations to connect directly with the Slack API, permitting users access to the Slack functionality with seamless integration.
Using this connector, businesses can create instant connectivity to popular collaboration, mobile, and social applications to streamline connectivity and integrate business processes.
Slack Connector is an easy and fast way to integrate with your organization’s team chats, create notifications, automate responses, and much more.
Future thoughts
The digital world is open 24/7. So it follows that digital consumers expect IT and customer service to keep the same hours. These exceedingly high expectations mean that no issue is too small or common to frustrate customers, from broken code to site-wide outages.
Slack streamlines operational alerting and incident management right out of the box, acting as a single command center for detection, alerting, containment, and post-incident analysis. Instead of a stressful, reactive, and siloed atmosphere, employees are equipped to take a proactive approach and collaborate in real-time with an evolving, intelligent tool.
Get started now with the Slack Connector to address integration challenges in your ecosystem in a much simpler way.