Achieving SQL server high availability in a multi-platform environment
Today's organizations often manage SQL Server databases across various platforms, including physical, virtual, cloud, on-premises, hybrid cloud, and cloud-to-cloud environments. Managing individual instances of SQL Server in these diverse environments is challenging enough. How can you ensure that your key SQL Server databases are highly available -- meaning, accessible 99.99 percent of the time -- when so many platforms are involved?
High availability (HA) in a multi-platform environment depends on multiple factors. You’re going to need at least two systems running SQL Server, but where your primary and secondary systems reside -- on-prem, in the cloud, in multiple clouds, or some combination of these -- really doesn’t matter. What matters is that the data is reliably replicated among the primary and secondary environments. That requires a robust data replication system, seamless communications between the primary and secondary environments, and systems in place to monitor the health of both environments and to manage failover in the event that your primary SQL Server system becomes unavailable.
Regardless of where your primary SQL Server infrastructure resides, the second one must be in a geographically separate location. That could involve two physical data centers on opposite sides of town, a physical data center and a cloud-based Availability Zone (AZ), or, in a pure cloud environment, two separate AZs in the same region. We’ll consider how your HA environment replicates data from primary to secondary infrastructure in a moment. First, let’s consider the underlying connection between the two environments.
To ensure the highest fidelity between instances of your SQL Server database, you’re going to want to use a HA solution that provides synchronous data replication. That ensures that a transaction being written to your primary SQL database is not fully committed until the same transaction has been written to the secondary instance of the database. For such synchronous replication to occur without a performance hit on your production systems, you need a fast and reliable communications infrastructure.
Similarly, you’ll need monitoring and management tools to gain visibility into the health of your distributed assets. You may be focused on ensuring that your primary production environment is operating properly, but if your secondary infrastructure suddenly goes offline you’ll have a problem if your primary infrastructure fails. So, you need real-time visibility into both infrastructures.
Finally, you need a mechanism that automatically moves your production applications and processes from primary to secondary infrastructure in the event that your primary instance of SQL Server becomes unavailable. For this, you’ll configure your compute and storage resources as a Windows Failover Cluster Instance (FCI). The functionality needed to do this -- Windows Failover Cluster Manager -- is built into Windows Server, and it enables you to create a cluster out of multiple physical or virtual server resources. One of those servers or VMs would act as the primary production node; one or more would act as secondary cluster nodes. If the primary SQL Server node were to fail, the FCI would automatically elevate one of the secondary nodes to primary position and its instance of SQL Server would take over production responsibilities.
On each node in the cluster runs the data replication services ensure data integrity across the primary and secondary infrastructures. The Availability Groups (AG) feature of SQL Server will replicate user-named SQL databases between two or more nodes in an FCI. It supports synchronous replication, so you’ll encounter virtually no data loss if you need to resume operations suddenly on a secondary instance of SQL Server.
Alternatively, third-party SANless Clustering software can also provide synchronous replication services. One major difference between the AG and SANless Clustering approaches is that SANless clustering is not intimately tied to SQL Server. SANless Clustering uses block-level replication to copy anything in storage on the primary node to the secondary nodes. If your storage system also contains other databases -- say, from MariaDB or PostgreSQL -- the SANless clustering solution will replicate that data as well (whereas the AG functionality in SQL Server will not).
As for the pros and cons of AGs and SANless Clustering? The AG functionality is built into SQL Server, so there’s no extra cost to gain access to its data replication functions. However, if you have multiple SQL Server databases (or if you are replicating one SQL Server database to multiple secondaries), you’ll need to use the Always On AG functionality in SQL Server Enterprise Edition. SQL Server Standard Edition includes only the Basic AG functionality, which limits you to replicating a single SQL database to a single replica). If you otherwise rely on SQL Server Standard Edition for your production system, you’ll find that to be an expensive upgrade. The AG functionality itself has also not been validated for use with all applications, so make sure your applications work with AG before committing to it.
And SANless Clustering? It’s a third party product, so you’ll pay additional license fees. However, you can use SANless Clustering to replicate any number of databases to any number of secondaries without having to upgrade your instance of SQL Server to the Enterprise Edition.
Image credit: dizain / Shutterstock
Dave Bermingham is Director of Customer Success at SIOS Technology. He is recognized within the technology community as a high-availability expert and has been honored to be elected a Microsoft MVP for the past 12 years: 6 years as a Cluster MVP and 6 years as a Cloud and Datacenter Management MVP. Dave holds numerous technical certifications and has more than thirty years of IT experience, including in finance, healthcare, and education.