AI & RoboticsNews

How IBM combines AIops and observability for proactive incident management

We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!


As the COVID-19 pandemic accelerated digital transformation and proved the need for enhanced IT infrastructure, businesses have rapidly migrated to the cloud and implemented more robust cloud compute strategies. The result? multicloud services continue to proliferate across the enterprise, as organizations increasingly see the challenges in leaving their business data entirely to vendors. Gartner predicted that nearly 75% of midsize and large organizations would use a multicloud and/or hybrid strategy by 2021 and it happened.

In fact, back in 2018, a survey of 1,106 business and technology executives by the IBM Institute for Business Value revealed that 85% of companies were already using a multicloud system to manage their information. Many infrastructure and operations (I&O) organizations are now “adapting their strategies to leverage cloud capabilities in preparation for a future of integrated solutions, resulting in AI, IoT and edge computing,” according to Gartner. So, multicloud is here to stay.

However, multicloud services also come with certain challenges. Flexera’s 2020 survey on the challenges of multicloud showed “managing multicloud was top of the list,” alongside issues with security, managing cloud spend and lack of resources/expertise. 

Dinesh Nirmal, general manager of IBM Automation, told VentureBeat how AIops and observability work together, its business benefits and about IBM’s recent updates to IBM Cloud Pak for Watson AIops software.

Analysis paralysis from too much data

IBM’s response to the need to better manage multicloud environments is to enable the interplay between AIops and observability. While this seems straightforward, there’s a big problem with the overabundance of data in today’s enterprise ecosystem. There are so many data sources that enterprise leaders are literally swimming in massive data lakes. More critical is the reality that It’s often difficult to convert this data into actionable insights.

That’s where observability, actionable observability, or application resource management comes in, said Nirmal.

“A [major] pillar in IT is all about incident avoidance and incident resolution — that’s where AI plays a huge role, as all this data comes through observability, helping you to correlate it using AI,” Nirmal said. “This can help to look for anomalies within alerts, events and logs to say ‘we’re seeing some anomalies and based on the past behavior, it looks like it could lead to this problem.’” 

Nirmal said organizations need to be able to observe and know their entire IT infrastructure — whether it’s hybrid cloud, multicloud, or behind the firewall — to ensure application performance management (APM). IBM uses actionable observability to bring in data from across all the APM vendors to ensure applications are running successfully at all times, he added.

Combining AIops and observability

AIops — a term first coined by Gartner — is the application of big data and machine learning (ML) to automate processes and operations, ensuring a correlation with the required speed for businesses today.  When this is combined with observability, a system can be thoroughly analyzed and the data pipeline can be seen and appreciated wholly. A Forrester study (commissioned by IBM) found that combining AIops and observability can reduce customer-facing outages by up to 50% and mean time to recovery (MTTR) by up to 95% for enterprises.

Managing multicloud is difficult, according to experts. Steve Hershkowitz, chief revenue officer at Virtana, notes in an article that the attractive features of the cloud are the same ones that “make it exceedingly complex to manage on an ongoing basis.”

One of the key things multicloud computing rests on is operational control — the ability of organizations to monitor their entire IT systems — but this isn’t often easy to do. With the sophistication of multicloud environments, there is an increasing need to improve observability in IT systems for better analytics and optimal performance. More than ever, organizations need effective AIops to uncomplicate their cloud environments so that they can effectively design, build and manage applications in the cloud.

However, another issue often arises: While data is the fuel for AIops, several challenges with the AIops data pipeline can lead to ineffective AIops. That’s where observability comes in, helping to solve issues like AI bias along the AIops data pipeline. Unifying AIops and observability enables enterprises to understand why problems happen, see other similarly related problems, discover the best ways to fix the problems, and provide insights on how to stop the problems from happening in the first place.

AI for incident detection and management  

The IBM approach to combining AIops and observability for providing actionable insights is embedded in a new version of its IBM Cloud Pak for Watson AIops software, which the company recently announced to help enterprises proactively resolve incidents by providing a new “stories and alerts” dashboard. 

The solution is an end-to-end approach that requires cross-field integration. To get the full scope, the version was developed with Instana (which IBM acquired in 2020 for observability data) and at present can onboard data from Turbonomic (which IBM acquired in 2021 for applications’ resource utilization). The full-stack application allows IT managers and site reliability engineers (SREs) to obtain a comprehensive view of how their IT environments are performing.

It combines the monitoring and event data from different sources, including Instana and Turbonomic, to learn the normal behavior and the baseline characteristics of applications. The software uses AI to quickly detect what the abnormal behavior in production applications is, then uses automation to take corrective action, resolve detected issues and reduce manual processes.

The Forrester study showed that organizations who deployed IBM Cloud Pak for Watson AIops eliminated 80% of the time spent remediating false-positive incidents. It also increased visibility into application performance, reducing the time to resolve issues by 75%.

While Nirmal agreed there are other players like ServiceNow in the space, he said IBM has a huge advantage because of its longtime customers and because the company has the skills and knowledge to build the right AI models using data it’s been working with for decades.

All-around automation using AI

Nirmal also weighed in on training the AI models for decision-making: “Predictability is driven by the data you feed into the AI,” he said. “The more trusted, clean data that you can give, the better accuracy you get.” In addition, organizations need to make sure they have good data to train their models, he explained.” Not only that, even after you train it, you have to continuously retrain it because the data is changing every day, every minute, every hour.”

IBM notes the combination of observability and AIops can have a major impact on an organization’s bottom line. The company claims this staggering impact is why organizations like T-Mobile, Electrolux, Carthartt and Taiwan’s National Center for High-performance Computing are rapidly adopting its solutions.  

Gartner notes one of the most compelling technology trends for 2022 is hyper-automation, which is often a result of AIops and observability. Nirmal pointed out that outages bring productivity, optimization and other problems, so one of the most prevalent themes in the IT industry is all-around automation using AI.

Unifying AIops and observability helps get value from multicloud 

Virtana’s State of multicloud Report 2022, which surveyed 360 CIOs and IT leaders in the US and UK, notes multicloud challenges will continue to grow as adoption increases. Rather than take a reactive approach, enterprise IT leaders should be more proactive. As more enterprises migrate to a multicloud approach, it’s key to prioritize managing those multicloud environments effectively.

Nirmal believes enterprise IT decision-makers want to get benefits from multicloud in three critical areas: optimization, productivity and product costs. However, automation is what gives them the benefit across those three pillars. Unifying AIops and observability, he said, is an effective way to ensure enterprises are meaningfully automating processes, quickly detecting incidents along production pipelines and getting the best value from multicloud solutions.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn more about membership.


Author: Kolawole Samuel Adebayo
Source: Venturebeat

Related posts
AI & RoboticsNews

Nvidia and DataStax just made generative AI smarter and leaner — here’s how

AI & RoboticsNews

OpenAI opens up its most powerful model, o1, to third-party developers

AI & RoboticsNews

UAE’s Falcon 3 challenges open-source leaders amid surging demand for small AI models

DefenseNews

Army, Navy conduct key hypersonic missile test

Sign up for our Newsletter and
stay informed!