Keeping AI data and workloads secure and accessible
AI is already revolutionizing whole industries and professions. New applications and projects appear regularly across every sector, limited only it seems by the boundary of our own inspiration. That means that AI workloads will be critical to organizations across the board; the question is: how can we ensure AI applications are stable, secure and accessible?
Many companies depend on the trusted backup to ensure fail-safety and security against data loss and outages. From a data protection perspective this makes sense, however, backups aren’t best suited to business continuity and disaster recovery (DR), particularly for the most important data and workloads, such as AI.
Backups are vulnerable because they only secure individual servers rather than entire applications. That means applications must be manually rebuilt from each component after each data restoration, which takes time, perhaps even weeks. To ensure the optimal availability of vital AI applications, organizations therefore need more up-to-date solutions that can deliver fast recovery. That perhaps explains why an increasing number of businesses are turning to DR systems for quicker restoration of crucial data and workloads.
Continuous Data Protection (CDP) is currently the most effective way to recover data and applications. Using CDP, each change in data is recorded in a journal as it happens, thus making it possible to restore the status quo which existed just seconds before an outage or malicious attack near instantly and without notable data loss.
Minimizing RPOs and RTOs for AI applications
Recovery Point Objective (RPO) and Recovery Time Objective (RTO) are two of the most important parameters of a disaster recovery or data protection plan. To minimize these for AI applications, near-synchronous replication provides the high performance of synchronous replication without the elevated network or infrastructure demands.
Near-Synchronous Replication, while technically asynchronous, bears resemblance to synchronous replication in that it involves writing data to multiple locations simultaneously, however, it allows for a slight delay between the primary and secondary locations. This always-on replication method ensures a continuous and real-time transfer of only the altered data to the recovery site, achieving this within seconds.
Due to its persistent nature, near-synchronous replication operates without the need for scheduling, avoids the use of snapshots, directly writes to the source storage, and doesn't require acknowledgment from the target storage. A key benefit lies in its ability to deliver a high level of data availability and protection while still maintaining faster write speeds compared to synchronous replication. This makes it an ideal choice for workloads with demanding write loads and substantial data volumes, such as critical AI applications.
AI Data mobility challenges IT infrastructure
AI is built on data at a scale exponentially larger than anything a traditional IT infrastructure has experienced before. Even the simplest of applications will consume exabytes of raw data required for model training and the associated inference. Data sets are frequently located at the network edge and must be transferred to a central repository for processing. Furthermore, at the conclusion of the AI data lifecycle, it should be archived for potential retraining down the line.
This all leads to entirely new challenges for IT infrastructure and management teams because these enormous volumes of data need to be shifted continuously. Existing network technology and data management solutions based on synchronous replication simply can’t move such large amounts of data. To shift AI data under limited processing power and bandwidth restrictions, asynchronous replication is the better option, because it guarantees continuous replication using low bandwidth on a block level that doesn’t require spikes in data transfer volumes.
AI needs CDP and near-synchronous replication to thrive
The potential for AI is vast, as we have seen over the past year with the launch of ChatGPT and various artwork creators. We can generate content, paint portraits in the style of Rembrandt and even make music. More broadly, AI will help diagnose illnesses, find cancer cells, drive our vehicles, identify crop disease, and monitor our environment among endless possibilities.
These will be critical applications and must be protected with the best possible DR solutions, such as CDP. The sheer scale of AI data is posing enormous challenges for existing IT infrastructures in terms of saving, managing and transferring the vast amounts of data involved. Only CDP and near-synchronous replication can provide the necessary mobility for these huge data sets. To manage your AI data effectively, it’s time to explore CDP-enabled data mobility solutions.
Christopher Rogers is Senior Technology Evangelist at Zerto a Hewlett Packard Enterprise company.