Top down, bottom up or a bit of both? Process and deployment considerations for AIOps
IT production environments are an essential part of any modern business organization. Today, it’s virtually impossible for an enterprise to function effectively without a defined set of IT solutions. The amount of data managed and needed to run business is growing exponentially, congruent with the amount of data needed to guarantee that these IT environments are always available. These two facts alone create a strong case for the Intelligent Automation (IA) of IT, because data really is the lifeblood of modern business. However, simply generating and managing reams of data is not enough. To derive tangible value from any data, organizations must ensure that the data generated is comprehensive, verifiable, and accurate. Failure here can render data meaningless and lead to poor decision making.
The quality and depth of data can be a game-changer for businesses, and while the human brain is an amazing organ, it can only do so much at once and maintain consistent performance levels. AIOps, the integration of AI in IT operations, on the other hand, leverages the power of machines to enable organizations to accurately comprehend and control the growing complexities of data-driven business ecosystems As more organizations embrace complete operational digital transformation, it’s critical that the data generated is intelligently gathered, organized, analyzed, and optimized. This is where AIOps delivers exponential value through the ability to take data and add context, intelligence and value, driving actionable insights and better-informed decision making. AIOps underpins the drive towards maximized ROI, minimal loss, and delivering complete customer satisfaction.
AIOps -- a must have rather than a nice to have
There’s no longer a valid business case for the old arguments of 'doing more with less.' The stakes are too high given the global 24/7 marketplace, the speed at which business moves, the need for fully optimized IT operations, security, and the awareness of potential failure points, disruptions and any adverse conditions. Put simply, it's no longer a question of if an organization needs AIOps, but why haven’t you already got it in place?
The compelling evidence is out there. For example, according to a recent Forrester Total Economic Impact study, Digitate’s AIOps technology optimized the new headcount of IT operations teams by 60 percent -- a result of the teams’ increased productivity and ability to scale. The study concluded that a company with a small, 10-employee IT ops team saved $1.4 million in labor cost (contract or permanent) over a three year period. For a large enterprise, that figure could be multiplied in the magnitude of 25-50 times. Looking at real-world customer applications, retail giant Walgreens has 9,000 stores and 4,500 call center agents at four locations. During the COVID pandemic, they would experience sporadic spikes in demand for vaccinations as the number of COVID cases rose and fell.
Supported by Digitate’s AIOps technology, they were able to determine when those spikes were most likely to happen and adjust store hours and staffing accordingly. In addition, AIOps enabled Walgreens to optimize its Salesforce usage and automate the resolution of IT tickets. As a direct result, AIOps was responsible for resolving approximately 31 percent of Walgreens’ total IT tickets, along with successfully monitoring and managing 95 percent of all events since deployment.
Making the commitment to implement AIOps requires a strategic plan of action, of course. So it’s important to establish rationale and the context in which AIOps will be deployed. There has been the realization that current optimizations are at a flex point, and a technology breakthrough is needed. What are problems to be addressed? Is there a focus on specific areas or will there be a more holistic strategy?
Once an organization has decided to invest in AIOps, there is a logical pathway to follow to deployment, given that everything is data driven. Establishing ITOps, the processes and tasks designed to support IT production environments, is usually the first implementation of AIOps. As AIOps is designed to optimize ITOps, leveraging automation, Machine Learning and Artificial intelligence, it’s important that organizations have a mature understanding of ITOps. This is the foundation for successfully deploying AIOps, because ITOps define the requirements necessary to enable AIOps. If these requirements are not clear and well defined right from the start, any IT solution will fail, and AIOps is no exception.
Typically, the first steps required in order to implement AIOps are shown below:
- People: Deployment has to start with people. It’s important to assemble a project team to agree the scope of work, set the criteria for potential vendors and map out the entire engagement and deployment project. Identify a platform owner and executive sponsors, supported by strong IA architect and IA delivery leads. Key deliverable at this stage include:
- Assessing the maturity of current ITOps and IT production environment.
- Assessing what are the most recurring issues.
- Building a business case and defining a clear path to ROI.
- Process: Understanding the nature of the beast. This is the often the most difficult step for organizations because often IT support resides in the ‘tribal knowledge’ of the IT support team. These team members may belong to other organizations, for example, a System Integrator, which could mean the knowledge of the IT support function is not documented. Acquiring that knowledge, dependent on the external organization, could be complex as they need to:
- Acquire the knowledge of how ITOps works.
- Document each Standard Operating Procedure (SOP) that describes how IT support is provided. This is critical because machines need to be ‘educated’ on how to perform support tasks.
- Define and describe what are the organization’s most critical data flows. For example, what is normal and what is not for each observable element such as IT service, traffic volume, or component state?
- Technology: Assessing the IA platforms available. Selecting the right solution from the right technology partner is a hugely significant decision, given the importance of the task at hand, the significant investment in resources, time and money, and the assumed longevity of the relationship with the vendor. Typical considerations here include:
- Drawing up a list of challenges and tangible deliverables, based on the processes outlined above.
- Considering short-term and long-term needs and cost-benefits analysis/ROI, scalability, platform flexibility.
- Ease of use.
- A one stop shop? A single solution that is able to handle both vertical and horizontal data flows. Adopting such a solution will facilitate integration points and the adoption of ML algorithms.
- Budget: The team should not only plan for the licensing, hardware, delivery, installation and training costs associated with the platform of choice, but also consider wider organizational implications, such as change management implications, as there could be the need to retrain people whose tasks are now managed by IA for deployment elsewhere.
Top down or bottom up deployment?
When it comes to how an organization actually deploys AIOps, there are two general reference models, which we refer to as Bottom Up or Top Down deployment. To better understand how these models are applied, Figure 1 below provides a representation of possible data flow for an enterprise with a typical technology stack, including ERP and other business applications, with a standard IT maintenance team.
The vertical dimension represents the technical layers needed to sustain a specific solution. The bottom and most fundamental layer is the hardware layer or infrastructure. Above that is the operating system that manages the communication and relationships of applications and hardware. Above that lies the application layer, representing the actual business applications an organization might use -- for example, an ERP suite, CRM system, email, website software, and databases, plus all the middleware or integration tools that connect them. The top layer illustrates the horizontal flow of data from one solution (column) to another. During each transition this data can trigger actions or decisions -- or become enriched for future steps. All these layers, both horizontal and vertical, are constantly communicating among themselves, to keep the whole data flow running smoothly.
When an organization considers the method of deployment, whether that is Bottom Up or Top Down, there are a number of factors that can affect that decision. For example:
- What is the operational maturity of an organization? Are they completely ready? Have they successfully captured and prioritized their entire ITOps processes? Are all their SOPs documented?
- What are the immediate versus longer-term organizational needs? Are there specific areas that they need to address right away? Or are the needs more holistic?
- How fast is an enterprise looking to transform? Depending on the size, nature and structure of an organization, it might not be realistic to achieve complete transformation at the same time, globally.
- What is the overall production environment architecture? What are the most problematic IT solutions and is any major change happening in production?
- What is the architecture for IT support tools, for example, monitoring, messaging, ticket management?
- Who owns production support knowledge? How available is this team (as mentioned previously this is the most difficult task)?
- What is the driver of this transformation, while it is not an option this need must be wanted to be successful
Based on the answers to these questions, alongside other considerations and rationale, the appropriate deployment model can be selected. Each method has its own benefits and challenges and are best suited to specific scenarios.
Bottom-up deployment model
Deploying AIOps via the 'Bottom Up’ model means it is applied at the very foundational levels of the organizational infrastructure IT layer and across all SOPs within that framework. This type of deployment has a longer lead time. However, once all the SOPs have been learned, AIOps can handle any number of typical situations that may arise operationally on a daily basis. Once the SOP learning is in place, AIOps can look at dataflow, how an organization manages master data and start applying organizational use cases to the situations it identifies as actionable.
This methodology requires a bigger investment in the beginning, and it has a slower ROI, but it creates a very solid base which provides broader business improvements over time. So, how long does this take, typically? Let’s proceed by step:
Achieving effective autonomous IT operation support requires the automation of around 80 percent of all ITOps SOPs, which means achieving the following Intelligent automation (I.A.) index target percentages:
- 50 percent total tickets resolved by I.A.
- 95 percent total alerts managed by I.A.
- 80 percent of non tickets support activities resolved by I.A.
Based on our experience it requires a minimum of 500 I.A. use cases to be deployed. So, if 50 are deployed each month it will take 10 months for deployment plus two months to set up a program, for a total of 12 months. This is very fast when compared to the average two to three years.
Top-down deployment model
In the 'Top Down' model, AIOps is applied to the most critical business data flows first, and then automates others one by one. This approach, while providing a faster ROI, is usually a response to a specific problem that an organization has identified and it might create the illusion that the IA journey is no longer needed. To avoid such a problem, a top down model requires a carefully planned architecture to fit all data flow requirements into one single I.A. solution and an equally well planned deployment strategy, so that each deployment improves the overall Intelligent Automation indexes. Organizations must consider all data flow, not just one, along with having an excellent understanding of just how the different end to end data flows connect with each other. While this can create short-term business value, benefits and ROI, it might also be more expensive in the longer term
The best of both worlds?
While these two deployment models outlined are very much 'horses for courses,' dependent on the reasoning and needs of an organization, they are not necessarily mutually exclusive. As Boston Consulting Group. (BCG) stated in its October 2020 report, AI is a Powerful Weapon in the Fight Against IT Problems, "by prioritizing use cases, you can start reaping the benefits of AI quickly -- in as little as three months if you know how you want to use AI and can access the relevant data. Contrast that with an all-encompassing 'big-bang' approach, where you may wait two years for a grand unveiling." BCG go on to assert that "by prioritizing high-value use cases, you visibly demonstrate the benefits of AI" in the short-term by tackling immediate challenges, which "helps build support and funding for a continuing effort and for the necessary changes to processes and organization. This kind of progressive approach also lets you deploy your target operating model in a gradual, value-driven way. Use cases and operating models develop in parallel and in sync."
This 'hybrid' approach, where organizations can realize value from triaging immediate key problems areas through top down quick fixes, whilst simultaneously committing to a bottom up approach to AIOps deployment can, if carefully planned, present an very good options route. CIOs are under constant pressure to provide good news to other CXOs in many organizations, and IT is all too often the favorite target of target. In such environments, a quick win to solve an immediate issue can spur a commitment to more major changes. This be a perfect compromise if it properly planned, explained and executed.
AIOps delivers proven benefits. Customer satisfaction increases as mean time to recovery (MTTR) and incident management improve. Better utilization of operational resources reduce overall operating costs decreases, and intelligent observation instantaneously flags, and can in some cases, can even preempt, potential operational problems. Employee satisfaction can also improve thanks to the automation of lower value and often tedious tasks, allied to greater control of operations and empowerment to focus on higher value add work. Key to unlocking all of this value (and more) is ensuring that the deployment of AIOps is optimized right from day one, that an objective view of organizational needs is created that can provide all of the elements needed to prioritize focus areas and choose the correct path to intelligent automation.
The AIOps journey is a necessary path and organizations must plan how to make it a wanted one, too. Implementing IA at scale is akin to hiking a mountain, the challenge can be great but the rewards and satisfaction are well worth the time and effort.
Ugo Orsi is Chief Customer Officer, Digitate.