How to Harness Kubernetes Operators for Efficient YugabyteDB Deployment

May 30, 2024

Cloud-native technologies have ushered in a new era of database scalability and resilience

requirements. Distributed SQL databases like YugabyteDB, which delivers high availability, global distribution, and PostgreSQL compatibility, are designed to meet these new demands.

Kubernetes is proven to manage stateless workloads, however, there is an increasing push to run a wide variety of application workloads in Kubernetes. Managing stateful applications, such as distributed databases, in Kubernetes is non trivial and introduces challenges in provisioning, lifecycle management, performance management, and monitoring[1].

Kubernetes Operators play a pivotal role in simplifying these management tasks. Operators are designed to automate complex operations and ensure efficient day 0, 1 and 2 management of workloads in a Kubernetes environment.

Understanding Kubernetes Operators

Kubernetes Operator Pattern

Figure 1: Kubernetes Operator Pattern (Source: Programming Kubernetes)

Kubernetes Operators extend Kubernetes API, automating the provisioning and lifecycle management of complex applications. They encapsulate operational knowledge, and automate tasks such as deployment, management, scaling, backups, and recovery. In the absence of an operator, these tasks require manual processes that are risky and prone to error.

Kubernetes Operators are not just useful, but essential for complex applications deployed on distributed databases, with advanced operations that depend on high reliability and intelligent systems.

Operators work by introducing new CRDs (custom resource definitions) that extend the Kubernetes API to describe new kinds of resources and provide a controller. This is a specialized piece of software that runs in the cluster responsible for managing these resources in a declarative way using a reconciliation loop.

The custom resources allow users to describe the desired state of the database, and the reconciliation loop constantly tries to compare to the actual state and make changes to reach the desired state.

The newly released yugabyte-db-k8s operator implements the operator pattern to manage YugabyteDB and related resources on Kubernetes. The next section examines why we chose to completely re-architect our previous Operator.

Why a YugabyteDB Kubernetes Operator is Needed

Complexity Simplified:

The new YugabyteDB Kubernetes Operator simplifies the complexities associated with deploying and managing a multi-region distributed database. By handling intricate configurations, automatic scaling, and updates, the Operator makes it possible for developers and operations teams to deploy YugabyteDB on Kubernetes with minimal overhead.

One of the major features we have introduced (in response to requests) is seamless scaling of YBDB pods across zones. This was a pain point in management using Helm charts. Because we manage yugabytedb using statefulsets, having a multi az deployment means managing three releases of the same helm chart (one per AZ). This made upgrades and scaling operations difficult, as each operation needed to be handled for each AZ separately.

For example, let’s consider reducing the scale of the database cluster. This is useful in cases when we may have overprovisioned our cluster. To save on resources we may want to reduce the number of pods in an AZ.

Reduce the number of pods in an AZ

To correctly complete this operation we have to safely adjust the quorum of the database, and orchestrate a data move to distribute the existing data in the pod to other pods.

Before the data move, we need to run pre checks to ensure the cluster has enough resources to store the existing dataset in the desired configuration. After the data movement is completed, we must verify that the load on the remaining pods is balanced and we don’t have a case where one pod ends up with a large number of tablets.

All of this orchestration needs to be done at the control plane layer, since parts of this operation are not idempotent. We need to maintain state outside the database to safely retry these operations in case of any transient failures.

The YugabyteDB Kubernetes Operator simplifies this experience for the user.

Lifecycle Management

Managing stateful workloads efficiently is crucial in Kubernetes. The YugabyteDB Kubernetes Operator ensures that data remains consistent and available across failures or scaling events; critical for any database system. The Operator manages the database starting from creation, through upgrades and updates, up to deleting the database (if desired). In addition to the high level workflows like creation or software upgrades, the YugabyteDB Kubernetes Operator allows for fine tuning of the database by exposing configuration flags, cpu and memory limits, and storage class customization.

Backup and Restore

Backing up large distributed databases is complex. In a distributed database, each pod will hold a subset of data. To take a consistent snapshot of data across pods, we need synchronization across multiple pods. Backups are also driven by policy and have their own metadata that describes storage config used, date of backup, type of backup. Complex control plane interactions are required to do this correctly. The YugabyteDB Kubernetes Operator simplifies this by introducing Custom Resources for database backup and restore. This allows you to take keyspace-wide backups, and restore these backups to a universe.

Automatically Retrying Failed Operations

Similar to backups, control plane and scaling operations on distributed databases need to be synchronized across the many pods that belong to the database. While the database is resilient to transient faults, operations such as scaling or upgrade are vulnerable to transient faults. Retrying these operations is a complex process that needs to account for current state of the infrastructure layer and maintain database availability for clients. The YugabyteDB Kubernetes Operator makes it simple by automatically retrying failed operations with an exponential backoff.

Support Bundles

Getting logs of Kubernetes-based YugabyteDB universes is easy using kubectl or by configuring a Kubernetes logging solution such as fluent-bit. However, yugabytedb-k8s-operator makes it possible to create point-in-time snapshots of logs that are essential for debugging complex issues.

This is simplified by introducing a custom resource called supportbundle which allows us to create a log bundle across operator and database pods. The custom resource also captures events and essential metrics from the underlying Kubernetes cluster. This log bundle can be easily copied out of the operator pod using a kubectl cp command.

Conclusion

The enhanced YugabyteDB Kubernetes Operator significantly simplifies the deployment, management, and scaling of YugabyteDB in cloud-native environments. By automating complex operations, it enhances reliability and also allows developers and operations teams to focus more on development and less on infrastructure management.

The benefits of adopting the YugabyteDB Kubernetes Operator include:

simplified operations
improved reliability
the ability to leverage YugabyteDB’s full potential in a Kubernetes ecosystem

Use and Contribute

Community members interested in trying out the YugabyteDB Kubernetes Operator can check out the newly released project at: https://github.com/yugabyte/yugabyte-k8s-operator. This currently in alpha stage, and is suited for experimentation and non-production use cases.

For contributions and feedback, please reach out to us via the Yugabyte Community Slack:
https://inviter.co/yugabytedb

May 30, 2024