Being Server Rack-Aware for On-Premise Deployment of YugabyteDB

Premkumar Thangamani

Data centers store large amounts of digital information securely and reliably. They also serve as hubs for network connectivity, providing a high-speed and reliable networking infrastructure to transmit data between servers, users, and other devices. To accomplish these, data centers maintain and manage all their machines/routers/switches in server racks, such as Dell, Racksolutions, Navepoint, etc.

On-premises data center

What is a Server Rack?

A server rack provides a standardized structure for organizing equipment like servers, networking devices, storage arrays, and other hardware components. They typically hold about 25-50 devices. Server racks are designed with sturdy frames and mounting rails to hold and protect expensive devices securely. For additional protection, modern server racks often have power surge protectors (e.g., Rackbar), network surge protectors (e.g., Tycon), seismic anchors and rails to protect against earthquakes (e.g., Great Lakes), and sometimes redundant power, network, and cooling features.

Server Rack Failures

Despite the multiple safeguards, data centers can experience rack-level events due to a misconfiguration, failure of a critical component, or something unforeseen (i.e. sprinkler head bursting directly over a server rack). It’s also not unusual for a rack to be taken offline for maintenance. To enhance resilience, it’s smart to spread your data across multiple racks so that your applications remain operational, even if one or more racks go offline. The distributed nature of YugabyteDB facilitates this process with ease.

Achieving Fault Tolerance and High Availability For On-Premise YugabyteDB Deployments

YugabyteDB has been designed from the ground up for high availability and fault tolerance, enabling it to survive failures across fault domains. Fault domains can be nodes, zones, regions, or in this case, racks. To survive an outage of one fault domain/rack, a YugabyteDB cluster needs to replicate its data across three fault domains – this is known as having a replication factor (RF) of 3. To survive the failure of two fault domains, a cluster would need an RF of 5.

Cloud Availability Zones as Fault Domains

Public cloud providers have availability zones that are full-fledged data centers with independent power and cooling infrastructures. It is also typical for enterprises to choose these availability zones as fault domains.

Some enterprises, however, have their own private on-prem data centers for enhanced privacy and cost-effectiveness. In such cases, racks can be considered as fault domains.

Node Placement Definition

In YugabyteDB, you define the placement of a node using a three-part naming convention in the form cloud.region.zone. The “cloud” represents the actual cloud provider, like AWS, GCP, or Azure. The region represents the geographic location, like US-East or EU-Central. The zone refers to the availability zone where the node is present, like us-east-1a or eu-central-2b. For on-prem data centers, you would set cloud to the data center name, region to the city, and zone to a rack or a group of racks.

Racks as Virtual Zones

As just mentioned, YugabyteDB treats racks as zones as the fault domain by default. For on-prem data centers, racks are treated as virtual zones. You can map your racks to zones, and YugabyteDB will automatically handle rack failures. For an RF3 cluster, you need 3 fault domains, which means you will need at least 3 racks. In case you have more racks (e.g., 6 racks), you can create virtual zones with 2 racks each.

Consider a scenario where you have a data center (dc1) in New York with 9 machines — prod-node [01-09] — hosted in 3 racks —Rack-A, Rack-B, and Rack-C. Three machines are stored per rack, as listed in the following table.

RackMachineIP
Rack-Aprod-node-01192.168.0.1
Rack-Aprod-node-02192.168.0.2
Rack-Aprod-node-03192.168.0.3
Rack-Bprod-node-04192.168.0.4
Rack-Bprod-node-05192.168.0.5
Rack-Bprod-node-06192.168.0.6
Rack-Cprod-node-07192.168.0.7
Rack-Cprod-node-08192.168.0.8
Rack-Cprod-node-09192.168.0.9

Deploy YugabyteDB Cluster Manually Using yugabyted

If you are using the open source deployment of YugabyteDB, you can use our OSS control tool yugabyted to deploy your cluster manually. Start your cluster using the –cloud_location argument (cloud.region.zone) to define the placement of your machine. For your data center, use cloud=dc1, region=newyork, and zone=rack.

For example, to start machine prod-node-01 ( is located in Rack-A), you can run the following command:

yugabyted start --advertise_address=192.168.0.1 --cloud_location=dc1.newyork.rack-a

To simulate a local cluster with 3 racks, you can write a basic bash script:

server=0
JOIN=""
for rack in a b c ; do
    for num in 1 2 3 ; do
        ((server++))
        if [[ ${server} != 1 ]]; 
        then
           JOIN="--join=127.0.0.1"
        fi
        yugabyted start ${JOIN} --advertise_address=127.0.0.${server} --cloud_location=dc1.newyork.rack-${rack} --base_dir=/tmp/data{server}
    done
done

Next, connect to your cluster using ysqlsh and fetch the cluster information.

SELECT host, cloud, region, zone FROM yb_servers() ORDER BY host;
           host  |  cloud  | region    |  zone
----------+-------+--------+--------
 127.0.0.1 | dc1       | newyork   | rack-a
 127.0.0.2 | dc1       | newyork   | rack-a
 127.0.0.3 | dc1       | newyork   | rack-a
 127.0.0.4 | dc1       | newyork   | rack-b
 127.0.0.5 | dc1       | newyork   | rack-b
 127.0.0.6 | dc1       | newyork   | rack-b
 127.0.0.7 | dc1       | newyork   | rack-c
 127.0.0.8 | dc1       | newyork   | rack-c
 127.0.0.9 | dc1       | newyork   | rack-c

By default, the replication factor equals 3, and the fault tolerance is configured to zone-level resilience. This ensures that YugabyteDB automatically replicates data and places it across three racks. Because an RF3 system can handle the failure of one fault domain, this setup can handle the outage (planned or unplanned) of any one rack.

Remember: If your cluster is deployed across more than the required number of fault domains, consider combining two or more racks into virtual zones. For example, with six racks named rack-[A, B, C, D, E, F], you can group them into the required number of fault domains. For an RF 3 setup, divide the racks into three virtual zones (e.g., rack-group-1, rack-group-2, and rack-group-3), placing nodes from rack-A and rack-B into virtual zone rack-group-1, and similarly configuring the placement for the others.

Establish Racks As Fault Domains and Deploy Clusters Using YugabyteDB Anywhere

You can also use our self-managed DBaaS software YugabyteDB Anywhere (YBA) to set up racks as the fault domain. To do this, create an on-prem provider configuration and add the racks as zones (see image below).

When you add nodes to the cluster, attach them to the appropriate racks (virtual zones), and you’re good to go!

Conclusion

YugabyteDB ensures high availability is achieved through a combination of its distributed architecture, data replication, consensus algorithms, automatic rebalancing, and failure detection mechanisms. These features guarantee that the Yugabyte database remains available, consistent, and resilient against fault domain failures. You can set up nodes, zones, racks, and regions as fault domains. For your on-prem deployments, in the absence of explicit zones, you can map your racks to zones and ensure the cluster is always server rack-aware.

Premkumar Thangamani

Related Posts

Explore Distributed SQL and YugabyteDB in Depth

Discover the future of data management.
Learn at Yugabyte University
Get Started
Browse Yugabyte Docs
Explore docs
PostgreSQL For Cloud Native World
Read for Free