The top barriers to AI success and how to overcome them [Q&A]
With the intense interest in AI and its rapid pace of embrace, organizations are under pressure to swiftly evaluate their data architecture. AI architects urgently need solutions that harness AI to boost revenue streams and streamline operational efficiencies, all while navigating the potential barriers to their success.
David Flynn, co-founder and CEO of Hammerspace, a specialist in the use and preservation of unstructured data, recently shed light on the growing complexity IT teams face in managing data pipelines. This complexity is further compounded as organizations integrate LLMs into AI applications, underscoring the significant obstacles to successful AI adoption.
BN: Given the accelerating pace of AI adoption, what are some of the biggest challenges AI architects face when incorporating distributed unstructured data?
DF: The problem for AI architects is twofold. Because distributed data gets stored in various silos on numerous users' machines and many different clouds, you cannot know what exists in a distributed data environment. It's in your machine, it's in other machines, it's in a variety of clouds -- you don't even know what exists. That is a significant issue AI researchers must address.
Another challenge is the vast number of files stored across disparate systems and the need to move those to an AI engine in the cloud. You can write some orchestration code that gets moved to the cloud to process in some models, providing access to identify the data and aggregate it to place it in the computing environment. However, dealing with the scale of millions of files, manually accessing each storage system, and doing a copy or replication process is an almost impossible task. You need an appropriate approach to achieve this quickly and efficiently, such as through software or automation.
BN: What innovative approaches are being introduced to help organizations address the unique enterprise requirements driven by AI?
DF: Traditionally, enterprise systems operated on a one-to-one basis, with a single user accessing a single dataset. However, the advent of AI has transformed operations, evolving into a many-to-one model where multiple models, researchers, and enterprise users can now access the same data set to meet diverse data requirements.
Enterprises often attempt to implement AI using existing IT infrastructures. Organizations rely on that data and those systems for their original purposes. Disrupting them for the sake of AI is typically not a viable option.
One such innovative solution is the global namespace, a unified naming system for resources accessed from multiple locations, and a global metadata layer that sits above existing storage systems so the data can remain where it is. This solution allows data to remain in its original location, enabling AI researchers and models to access and leverage data without the need for data relocation. All users can access the same file metadata, regardless of where the files are stored, without having to manage file copies between silos or locations.
BN: AI model training and inference costs are significant barriers to successful adoption in any industry today. How can the industry overcome these barriers?
DF: Utilizing advanced technologies involves significant data processing, requiring high-performance computing where multiple processors cluster together to process large data sets. Optimally, you can achieve this by locating the data alongside available GPUs and finding ways to use your data with rented GPUs from a cloud or GPU-as-a-Service provider model.
Data orchestration allows you to easily identify what data exists and the data sets you want to use with available GPUs. You may only need to run the GPUs for a few days; renting avoids the costs incurred by owning, powering, and building the necessary infrastructure. Even when AI activities mature substantially, there may still be valid reasons why owning GPUs is not beneficial for an organization.
BN: What are the key IT infrastructure considerations in the decision-making process, including storage and compute resources, and how are cloud-based and on-premises systems integrated?
DF: When considering the optimal infrastructure for AI, the critical factor is efficiently utilizing GPUs. Ideally, there is no need to purchase additional servers or client networks solely for GPU computing. Creating a separate network specifically for AI or deploying specialized clients with restricted access to data is not ideal. Utilizing existing components such as Ethernet, Windows, and Linux clients already connected to your enterprise data allows for a seamless connection to your AI environment and provides the ability to perform highly specialized tasks.
The proper enterprise infrastructure must meet enterprise standards, allowing the operating systems and virus scanners already in operation to continue working. Using the networks you've already deployed avoids introducing unnecessary risk. Cloud and on-premises systems must operate under a unified global namespace and shared metadata to prevent isolation and allow for efficient identification and transfer of data between cloud-based GPUs and on-premises data.
BN: Data is often scattered across disparate locations and data governance and accessibility issues are also concerns. How can organizations address these pain points?
DF: As the landscape has evolved into a many-to-one model, multiple users accessing the same data raises concerns about scattered data, governance, and accessibility. It's crucial to recognize that people may use data for different purposes, potentially exposing sensitive corporate information. Instead of maintaining separate policies for each storage location, such as NetApp, Isilon, Azure Blob, etc., organizations must implement a unified data management policy independent of the storage system.
BN: How can you deal with the placement of GPUs in relation to data to achieve optimal performance?
DF: Because supercomputing and AI architectures look almost identical, many enterprises try to mirror what people have done with high-performance supercomputing. They run into problems because this approach won't meet enterprise standards. Hammerspace leverages supercomputing performance, ensuring GPUs are used at maximum capacity for high performance and efficiency in powering big computing environments, and, based on our standards work with the Linux community, our solutions meet enterprise requirements. Additionally, our global namespace and metadata management automates the identification and movement of data sets to GPUs, enabling rapid streaming for optimal efficiency.
With the emergence of AI and the rise of hybrid cloud environments, there is a many-to-one relationship with data that has fundamentally shifted the relationship between data and its use cases. Repurposing legacy systems designed for a one-to-one data application creates a host of issues and could be more efficient. Orchestrating data for many-to-one data usage models and the use of a global namespace are the kinds of forward-thinking solutions that will be crucial in the ever-evolving AI landscape.
Image credit: akarapongphoto/depositphotos.com