Why run your database in Kubernetes? [Q&A]
Kubernetes is one of the most widely used platforms for running containerized applications. Many businesses though still run their databases in a more traditional environment.
Of course there's no reason why you can't run a database in Kubernetes and there are many advantages to doing so. We spoke to Karthik Ranganathan, founder and CTO of cloud-native database specialist Yugabyte, to discuss the pros and cons.
BN: When is Kubernetes best utilized for running a database? What is the ideal use case?
KR: The best use-case scenario for Kubernetes is when you're running a large number of databases in a multi-tenant environment. To keep up with the demand for services, many companies are adopting microservice architectures in which applications are structured as loosely coupled, independently deployable services communicating via APIs. This creates a lot of relatively small databases piling up within a finite set of nodes, which makes managing those databases difficult.
In a multi-tenant cloud environment, where servers are shared and software instances serve multiple, distinct groups, the challenge becomes more complex. That's where Kubernetes can make the biggest difference, simplifying management by enabling the efficient placement of services within nodes.
BN: What benefit does running a database in Kubernetes have from a resource utilization standpoint?
KR: Mass deployments of microservices can leave companies with sub-optimal allocations of databases on a limited set of nodes, which puts a strain on resources.
Kubernetes applies an infrastructure-as-code approach to the challenges of managing those deployments. It allows the underlying systems to determine the best places to put the databases while optimizing the resources used to place those nodes. Its orchestration platform can be used to resize pods dynamically, enabling elastic scaling as needed to meet workload requirements.
Its out-of-the-box infrastructure orchestration allows you to set stateful policies to prevent data loss, for example, in the event of a hardware failure. Companies using Kubernetes not only save on costs this way, but require fewer nodes to run the same databases.
BN: How does Kubernetes address the problems related to legacy code?
KR: Most organizations are saddled with large amounts of legacy code, which can complicate their ability to build, deploy and manage workloads at multiple locations. That code also can hinder an organizations’ ability to move workloads from cloud to cloud.
With Kubernetes, you can deploy your infrastructure as code consistently across the enterprise, providing portability between clouds and at the edge, as well as on-premises. It allows you to write a bit of code -- say, describing the resource requirements deployed to the Kubernetes engine -- and let the platform take care of distributing it. Kubernetes effectively gives you the same kind of control with cloud deployments that you would have on bare metal servers in the data center.
BN: Are some types of a database a better fit for Kubernetes than others?
KR: There are many benefits to running a database in Kubernetes. For example, Kubernetes allows automated backups and database software upgrades, which is particularly advantageous when working with a database cluster. Kubernetes makes it simple to patch a security vulnerability across a cluster. But, some databases -- particularly older models -- can have trouble running on a new platform.
A resilient, distributed database can mitigate the challenges involved with running Kubernetes while ensuring that data is always available. A distributed SQL database, for instance, is deployed on a cluster of servers as a single logical relational database. It automatically replicates and distributes data across multiple servers, providing consistency across all availability zones and geographical zones. This keeps things up and running even if there’s a pod, node, or underlying infrastructure failure. The cluster can detect a failure, handle it, and recover without any loss of data or access. It also operates behind the scenes, without the need for operator intervention.
In addition to geographic distribution, distributed SQL databases provide resiliency against failures. They provide continuous availability and horizontal scalability, allowing operations teams to add or scale back nodes as needed, without downtime. They're also feature-compatible with SQL and RDBMS features, and are hybrid and multi-cloud ready. They can run data infrastructure in any environment.
BN: Does Kubernetes offer automation benefits as compared to a traditional RDBMS?
KR: Day two operations, which include maintenance such as configuration adjustments and upgrades, can be complicated for a traditional relational database management system (RDBMS). For example, if you lose a pod while using a traditional RDBMS, the user is responsible for migrating and re-synching data between pods. Manual migration also involves checking the cluster to see if it's under a heavy load. If it is, you have to wait for the load to clear before moving the data. Even migrating data automatically requires you to build in those checks.
The automation provided by Kubernetes simplifies processes across clusters such as backups and upgrades while automating the distribution of data across nodes in a cluster, and replicating data in a consistent manner.
BN: What are some of the disadvantages/trade-offs associated with running your database in Kubernetes?
KR: While there are substantial advantages to Kubernetes, no system is entirely perfect and you will have to accept a few potential trade-offs.
For example, there is the possibility of a pod crash despite Kubernetes' orchestration capabilities. Crashes can result from out-of-memory (OOM) errors, as well as when moving pods around. While moving pods when a new pod is added, locally attached storage doesn’t exist on the new pod, resulting in data loss.
Locally attached storage provides the fastest performance, which is ideal if low-latency is a priority. But, it doesn't travel with a pod as it's being moved. External persistent storage, in some form on network-attached storage, solves that problem, but there is a trade-off in performance.
Some Kubernetes deployments may require a load balancer to solve networking restrictions. There might also be networking complexities involved when replicating data across different data centers -- for instance, when trying to ensure geographic redundancy in the case of a natural disaster. This can call for complex solutions such as DNS chaining, or implementing a service mesh, which can degrade service since it runs over HTTP rather than TCP.
There are some intricacies to running Kubernetes, but many of the challenges and trade-offs can be mitigated by a distributed SQL database, with its automated replication and distribution of data. This approach can help organizations take advantage of the great benefits of Kubernetes, while reducing many of the risks involved.
Image Credit: wavebreakmedia / Shutterstock