The top challenge when implementing AI for business: Lack of high-quality data

AI growth and adoption in the UK are surging, with the market valued at more than £16.8 billion and expected to reach £801.6 billion in the next decade. Approximately 15 percent of UK businesses are already using AI technologies such as data management and analysis, natural language processing, machine learning, and computer vision. And across the pond in the US, AI is expected to contribute a significant 21 percent net increase to US GDP by 2030, showcasing its substantial impact on the economy.

Growth in any new technology is never without its challenges. For AI, these include ensuring data privacy, addressing ethical concerns, and navigating the complexity of integrating with existing IT infrastructure. Data quality is central to resolving these challenges. To be useful, the data used for AI must be high-quality, well-structured, and from trusted sources. These properties are the foundation for all AI models and determine their effectiveness and reliability.

A recent ESG whitepaper on IT-related AI model training revealed that 31 percent of firms consider limitations in data quality to be a major barrier to AI integration. Our discussion here focuses on strategies to address this issue, emphasizing the importance of collecting comprehensive, accurately labelled data at scale, and the critical role of human oversight (aka “human in the loop”) to ensure data integrity.

Whether an AI deploys large language models like ChatGPT or machine learning (ML) techniques, its effectiveness crucially depends on the quality of the model’s underlying data. Poor-quality data can lead to inaccurate outputs, damaging trust in the reliability and utility of AI systems, and compromising outputs, which rely on meaningful patterns to provide relevant and explainable insights.

Unfortunately, not all data is created equal. Traditional ML models, which use contextual data from verifiable sources, are considered more reliable for decision-making. In contrast, generative AI models draw from a broader pool of unverified sources, often leading to inaccuracies.

For example, when applying AI in an IT infrastructure context, an ML model trained on endpoint data can reveal hidden issues based on anomaly detection. The AI can see incipient issues based on their deviance from baseline patterns. This capability enables IT teams to operate on a more proactive level. In many cases, proactive interventions are possible thanks to predictive analytics–a key benefit of machine learning that enables smarter fixes and saves money by using predictive analysis, combined with sensors, to alert on IT issues before they escalate into a major impact on the end user or the business at large.

Consolidating diverse data sources can be challenging. For IT teams, collecting data regularly from multiple endpoints means that there is a better understanding of the tech estate, from hardware and software implementations to network performance. The precision of AI-driven recommendations improves with the volume of parameters used to fine-tune the model. A useful analogy that underscores the importance of amassing extensive data to achieve sharper clarity and accuracy in AI applications is to think of data points as pixels in an image -- the greater the number of pixels, the clearer the image becomes.

Platforms such as Lakeside SysTrack solve this obstacle by collecting and analyzing a vast set of endpoint data, collected from 10,000 points every 15 seconds, with 1,200+ sensors assessing each endpoint across an enterprise.

The depth, breadth, history, and quality collected contrast with other industry players offering fewer data points less frequently and provides complete visibility across the IT estate. With this holistic view, AI models are improved while IT support technicians and analysts can better determine which users may be facing device performance problems, and in turn be dealing with a poor digital experience. Data-driven visibility also can uncover ways to remediate IT issues, areas where the environment is underperforming, impacts of the latest IT rollout on users, and so much more. Armed with insights from the AI model, IT can pivot from reactive to proactive.

Let’s take an example of app performance. Simply taking a snapshot of an app's CPU and memory usage falls short of providing the comprehensive data needed to train an AI model effectively. To thoroughly assess an app, additional metrics such as the effects on network performance, GPU usage, etc. must be gathered. Contextual information is equally vital, with historical data that reveals the app’s normal performance parameters, its functionality alongside other apps, and its interactions with the system's hardware and drivers. Given the dynamic nature of these variables, understanding them can be complex. However, using ML and a robust data pool enables IT teams to develop a detailed understanding of an app's performance intricacies.

No matter the outputs generated by AI, human oversight remains crucial at this juncture, as models, especially those using generative AI, can struggle to differentiate good from bad data. Ensuring data is trustworthy and well-curated is essential, as it serves as the food that powers AI models, but humans still need to provide crucial validation and direction.

Human oversight becomes even more critical in the context of proactive IT management. While ML, supported by extensive datasets, significantly boosts anomaly detection and predictive capabilities, it is the human input that ensures any insights, whether based on unusual patterns or trends, are actionable and relevant.

For example, using natural language processing, teams can efficiently manage large-scale queries across systems, such as analyzing average usage of Microsoft Outlook or identifying employees who haven’t used certain software licenses, which adds unnecessary costs. While this AI integration streamlines operations, it still relies on humans to ensure that any intervention is tailored correctly to each situation. Ultimately, the AI can become a trusted “copilot” for the IT support agent or the Level 3 systems engineer.

As organizations evolve towards proactive, predictive, and one day fully autonomous IT, prioritizing high-quality data becomes paramount. The better the data, the better the AI. Clearly, high-quality data and trust in AI go hand in hand. The underlying data not only determines the reliability, explainability, and relevance of AI outputs but also ensures that users can trust these outputs.

A robust data strategy must extend beyond mere collection. To safeguard AI applications from inaccuracies and biases and to ensure smooth integration, organizations must commit to rigorous data acquisition, meticulous data management practices and ensure human oversight. These steps are essential for future-proofing business operations and establishing AI as a pivotal asset in our shift to autonomous IT.

Chanel Chambers is VP, Product Marketing, Lakeside Software.

© 1998-2024 BetaNews, Inc. All Rights Reserved. Privacy Policy - Cookie Policy.