Optimizing cost and availability when running SQL Server on AWS EC2
If you’re thinking about moving your SQL Server instances to the cloud, there are a lot of good reasons to choose AWS EC2. AWS offers a wide range of purpose-built systems, so you can easily find one that will support your organization’s particular needs. It has a global reach, with 69 availability zones in 22 geographically distinct regions around the world.
Building out a SQL Server infrastructure designed for high availability (HA) or disaster recovery (DR), though, can be costly. Are there ways to take advantage of AWS EC2 to reduce those costs? The short answer is yes. The longer answer is how much money you can save using AWS EC2 depends on the choices you make when configuring for HA and DR.
Clustering for HA or DR
To configure for HA and DR in the cloud, you need to configure a cluster of virtual machines (VMs) running SQL Server. If the active instance of SQL Server goes offline, another instance of SQL Server needs to be able to take over. For HA, you’ll want an SLA that ensures that at least one of the VMs in the cluster will be available 99.99% of the time, and ensuring that level of availability will require your cluster to span multiple AWS data centers. If you’re configuring for DR, your cluster should include VMs spread across different regions as well.
Containing Costs with AWS EC2
Given the amount of infrastructure involved, it’s no wonder that configuring for HA and DR is costly. Certain features of AWS can help contain those costs.
Optimize CPUs
AWS EC2 offers a wide range of purpose-built servers. Some are compute-optimized; others are memory optimized; others are optimized for storage I/O or storage density. You can size your target servers in terms of numbers of cores, RAM, and so on.
But what if your ideal configuration -- in terms of memory and network performance—is overkill in terms of CPU power? A configuration with a 96-core vCPU, for example, may be far more than you need (or want, as you’d need a SQL Server license for each core).
Not a problem: AWS EC2 offers the Optimize CPUs option, which enables you to purchase only the number of vCPUs you need. If 24 of 96 vCPUs suffices for you, Optimize CPUs would effectively turn off 72 vCPUs.
Note that you cannot decide later than you want to turn some of those cores back on. Once you’ve selected the number of cores you want using Optimize CPUs, those are the cores you have for the duration.
Reserved Instances
The AWS EC2 Reserved Instances option can also help you save money. With Reserved Instances you commit to your cluster for a longer period of time -- up to three years --and you pay some or all of your fees up front. While that may mean a large initial payment, your total costs can end up being a fraction of what you would have paid on a month-to-month plan. You need to determine your long-term infrastructure needs before you commit, of course, but once you know that, the Reserved Instances program may offer a way to save significant money.
Dynamic Sizing
Another way to contain costs is to under-configure the backup servers in your cluster. Since the majority of your SQL Server activity is always going to take place on your primary server, you may want to consider backup servers built around configurations that are smaller, less powerful, and less expensive. If your primary VM fails over to a backup VM, you can simply reboot the secondary VM and immediately resize it to match the larger primary configuration. You’ll be offline for a few minutes while the new VM sets up, and the resized secondary VM will now incur a higher cost, but in minutes you’ll have a system that’s as powerful as the one you had been using. Until that moment, though, you had not had to pay for a powerful VM that was doing very little.
Accessing Your SQL Server Data from Your Cluster
While AWS gives you many ways to save money by tuning and sizing the VMs in your cluster, you still need to think about how the instances of SQL Server on these VMs are going to access the data whose availability you are trying to ensure. Native cloud storage cannot be shared among VMs in a cluster the way it can be shared among VMs in an on-prem Failover Cluster Instance (FCI), so further configuration decisions are required.
You have three choices when looking at AWS EC2. The first -- working with shared SMB3 file storage using the AWS FSx option -- isn’t really a viable choice because it only offers a 99.9 percent SLA, significantly below the 99.99 percent availability threshold associated with HA.
The second involves reliance on third-party products that overcome the constraints inherent in cloud storage and enable all the VMs in a cloud-based cluster to work with the same data. These solutions provide synchronous or asynchronous replication of block-level data from the primary SQL database to storage associated with the secondary VMs in your cluster. In a failover scenario, all the data is locally available to the secondary VM, and the secondary instance of SQL Server will be able to access the database as soon as that failover VM takes over as the primary server.
A third approach involves configuring your cluster as an Always On Availability Group (AG), which draws on services in SQL Server Enterprise Edition to replicate the user-defined SQL databases from the primary server to the secondary servers. While this approach eliminates the need to license a third-party product, AG replicates only the user-defined databases from the primary SQL Server instance. Those secondary instances will not have copies of the databases containing passwords, agent jobs, and other information that, long term, is crucial to have.
Finally, as you may have noticed, I said that AG requires SQL Server Enterprise Edition. If your use case doesn’t already require SQL Server Enterprise edition, you may find that the cost of licensing that version undermines all the cost-cutting you thought you had just achieved. You may find that the expense of those third-party options, which work with the standard editions of Windows and SQL Server, may be worth a second consideration.
Image credit: dizain / Shutterstock
Dave Bermingham is the Senior Technical Evangelist at SIOS Technology. He is recognized within the technology community as a high-availability expert and has been honored to be elected a Microsoft MVP for the past 10 years: 6 years as a Cluster MVP and 4 years as a Cloud and Datacenter Management MVP. Dave holds numerous technical certifications and has more than thirty years of IT experience, including in finance, healthcare, and education.