In this special feature, Stephen Amstutz, Head of Strategy and Innovation at Xalient, discusses the role of AI in the shift from network monitoring to observability, highlighting the benefits of AI observability. AI to limit downtime, protect brand reputation and ultimately save money! Stephen is a results-oriented and hard-working professional, able to understand complex issues outside of his direct area of expertise. He enjoys the challenge of finding the right technical solution to meet a client’s business challenges. Stephen’s background in electronics has provided him with a strong analytical foundation which he has maintained through over 20 years of experience in the design, implementation and support of various IT infrastructures.
In today’s world, data volume and network bandwidth requirements are constantly increasing. So much is happening in real time as businesses adapt and move to become more digital, which means the state of the network is constantly changing. Meanwhile, users have high expectations for apps – fast loading times, visually advanced look and feel, with feature-rich content, video streaming and multimedia capabilities – all of which eat up network bandwidth. With millions of users accessing mobile apps and apps from multiple devices, most businesses today generate seemingly unmanageable volumes of data and traffic on their networks.
Networks deal with unmanageable volumes of data
In this always-on environment, networks are completely overloaded, but organizations must still provide users with optimal performance from their network without service degradation. But traffic volumes are increasing, and this is causing networks to burst at peak times, like the LA 405; no matter how many lanes are added to the highway, there will always be congestion issues during the busiest times.
For example, we are seeing a growing need for train operators’ networks to manage video footage from body-worn cameras, to reduce anti-social behavior on trains and in stations. However, this has a direct impact on the network, with daily downloads of hundreds of video files consuming bandwidth at a phenomenal rate, yet operators still have to go about their daily business while countless hours of video footage are downloaded and processed.
This is a good example of where AI and ML can and do help organizations take a proactive stance on capacity and analyze whether networks have crossed certain thresholds. These technologies allow organizations to “learn” seasonality and understand when there will be peak times, by implementing dynamic thresholds based on time of day, day of week, etc., by result. AI helps spot anomalous activity on the network, but now this traditional use of AI/ML is starting to shift from “monitoring” to “observability.”
So what is the difference between the two?
Monitoring is more linear in its approach. Monitoring informs organizations when thresholds or capacities are reached, allowing organizations to determine if networks need to be upgraded. While observability is more about correlating multiple aspects, gathering context, and behavioral analysis.
For example, when an organization can monitor 20 different aspects of an application to make it run more efficiently; observability will take these 20 different signals and analyze the data making diagnostics with various scenarios presented. It will leverage rich network telemetry and generate contextualized visualizations, automatically initiating predefined playbooks to minimize user disruptions and ensure rapid service restoration. This means that the engineer is not waiting for a call from a customer that an application is running slowly. Likewise, the engineer doesn’t need to log in and run a slew of tests, or painstakingly sift through hundreds of reports, but instead can quickly triage the problem. It also means that network engineers can proactively explore different dimensions of these anomalies rather than getting bogged down in mundane, repetitive tasks.
This provides clear business benefits by reducing the time teams spend manually browsing and analyzing data and alert domains. This leads to faster debugging, increased availability, better performing services, more time for innovation, and ultimately, happier network engineers, end users, and customers. Observability correlation of multiple activities allows applications to operate more efficiently and identify when a site’s operations are sub-optimal with that context provided to the right engineer at the right time. This means that a high volume of alerts is transformed into a small volume of actionable information.
machines over humans
Automating this process and using a machine rather than a human is much more accurate because machines don’t care how many data sets they need to correlate. Machines build hierarchies, and when something in that hierarchy impacts something else, the machine spots certain behaviors and finds those flaws. The more datasets added, the more the picture begins to build for engineers who can then determine if further action is needed.
Let’s take another concrete example. We are currently in talks with a large management company that owns and manages gas station forecourts. They have 40,000 gas stations and each forecourt has about 10 pumps, which equals 400,000 gas pumps across the United States. Their current problem is a lack of visibility into gas pumps and grid-connected EV chargers. As a result, when a pump or charger isn’t working, they may not notice it until after a customer complaint, which is less than ideal.
The network telemetry we collect and this behavioral analysis means that we develop business information, not just network information. You can see if a gas pump stops creating traffic, which triggers a maintenance request to go and fix the pump. This is not a network problem, but network traffic can be exploited to find the business problem. This is a use case for gas pumps and electric vehicle chargers, but imagine how many other grid-connected devices in factories or production facilities around the world could be used from the same way.
Obtain actionable insights quickly
This is where our AIOps solution, Martina, predicts and fixes network outages and security breaches before they happen. Additionally, it helps automate repetitive and mundane tasks while proactively bringing an issue to an organization in a contextualized and meaningful way instead of just submitting it to the customer for resolution. Martina uncovers issues with recommendations to fix the problem, ensuring organizations always have resilient, high-performing networks. Essentially, it makes the network invisible to users by providing clients with secure, reliable, and high-performance connectivity that works. It provides a single view of multiple data sources and easily configurable reports so organizations can get insights quickly.
Executives and boards want their network teams to be proactive. They do not tolerate poor network performance and want any degradation in service, however small, to be quickly resolved. This means that teams must act on anomalies, and not on thresholds, to understand the behaviors to be expected and act upstream. They need fast MTTD and MTTR because poor performing networks and downtime impact brand reputation and ultimately cost money! This is where the proactive observability of AI/ML comes into its own.
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: https://twitter.com/InsideBigData1
Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/
Join us on Facebook: https://www.facebook.com/insideBIGDATANOW
#enabling #organizations #move #network #monitoring #proactive #observability #insideBIGDATA