Three Effective Distributed Database Deployment Topologies for Dual Data Center Challenges

Srinivasa Vasu

YugabyteDB is engineered for high availability, scalability, and fault tolerance by distributing data and request processing across multiple fault domains. However, even with this distributed architecture, two significant challenges emerge when operating in a two-datacenter or two-region model:

  1. The risk of a complete regional outage (with an arbitrary majority) leading to the loss of a distributed quorum
  2. The potential for data loss due to asynchronous data replication between data centers or regions.

To overcome these challenges, YugabyteDB supports various deployment topologies designed to meet specific high availability and disaster recovery needs. In this blog post, we will explore these deployment options, examining both happy path scenarios and failure scenarios, and discuss the trade-offs between recovery time objective (RTO), recovery point objective (RPO), latency, and other factors.

The Odd Quorum Conundrum in a Two Data Center Model

In YugabyteDB, a quorum is the minimum number of replicas that must participate in a write transaction to ensure data consistency and availability. Typically, an odd quorum is required to avoid “split-brain” scenarios and ensure that there is a clear majority for decision-making. However, maintaining this odd quorum in a two-datacenter model is challenging because it involves having an arbitrary majority in one datacenter. If the data center with the majority of replicas goes down, it will affect the availability of the service.

The Challenge With Adding a Third Region

The ideal solution to this problem is by adding a third region. However, many enterprises still operate with two data centers, which must be kept on par and regularly updated as part of their business continuity planning (BCP). Executing BCP demands considerable planning, time, and effort.  In addition, regulatory requirements require enterprises to alternate between primary and secondary roles of data centers every six months to ensure readiness for a disaster. Let’s look at how different deployment topologies can help solve this issue.

Possible Deployment Topologies

In the following section, we’ll assign names to regions to identify easily. In the context of a self-managed infrastructure, the term “region” refers to a data center.

Regions / Data CentersMumbai, Delhi
ClassificationRegion1 / DC1Mumbai
Region2 / DC2Delh

    1. xCluster

A database is distributed across multiple availability zones in a region/data center, with each region having its own isolated setup.

Regions / Data Centersumbai, Delhi
ClassificationPrimaryMumbai
SecondaryDelhi
Fault DomainsPrimaryH1a, H1b, and H1c (isolated domains within a region)
SecondaryH2a, H2b, and H2c (isolated domains within a region)
TopologyxCluster

xCluster overivew - YugabyteDB

Key Points: 

      • Each region has its own isolated YugabyteDB cluster.
      • Each cluster ensures fault tolerance at the availability zone level within a region.
      • Asynchronous cross-cluster (xCluster) data replication between “primary” and “secondary” regions.
      • Data replication can be either unidirectional or bidirectional.
      • Each cluster can be scaled independently.
      • Tunable reads can be enabled for each cluster.

Now let’s go over each of the failure scenarios.

Primary Region Failure:

Architecture of a primary region failure

Impact:

      • Depending on the direction of replication, the “secondary” data center can be quickly utilized to continue serving the traffic.
      • If the replication topology is a unidirectional< transactional xCluster from “primary” to “secondary,” the “secondary” can be promoted as primary once the “safe time” is determined.
      • If the replication topology is bidirectional, app traffic must be routed to the “secondary” based on acceptable data loss.
      • RTO in a unidirectional xCluster is determined by the amount of  time required to transition from the “primary” to “secondary” cluster. At this time, we do not provide auto-failover features at the driver level. Therefore, the application or application-ops team must handle it. The normal DR activation procedure is usually followed in this.
      • RPO is determined by the replication lag between the two regions/data centers. Because of the async data replication between them, this topology cannot guarantee a zero RPO.

Secondary Region Failure:

Architecture of a primary region failure
Impact:

      • Depending on the replication direction, the other cluster can be quickly utilized to continue serving the traffic. The other points of impact are the same as the previous failure scenario.
    1. Globally Distributed

A database that is distributed across multiple fault domains between two regions/ data centers.

Regions/Data CentersMumbai, Delhi
Fault DomainsMumbaiH1
MumbaiH2 (isolated domain within the same region)
DelhiH3<
DelhiH4 (isolated domain within the same region)
Replication Factor (Primary)Total: 3H11 replica (data copy)
H21 replica
H31 replica
Replication Factor(Secondary)

Failover cluster during a disaster

Total: 3H4 (H4a,4b,4c)3 replicasTopologyGlobally distributed

Globally distributed architecture
Key Points:

  • A single cluster is distributed across all three fault domains (H1, H2, and H3).
  • The topology can withstand a single fault domain failure.
  • The two nearby fault domains have a proportional effect on write latency.
  • Data and request processing are distributed equally across the three fault domains.
  • Horizontally scalable and tunable reads can be enabled.

Fault Domain Failure:

Fault domain failure - globally distributed clusterFault domain failure II - globally distributedImpact:

  • Topology can withstand a single fault domain failure.
  • The cluster remains fully operational, as two-thirds of the fault domains are healthy.
  • RTO is internally balanced because failure causes leader elections in the surviving fault domains.
  • RTO ranges from 3 to a few seconds, depending on the number of tablets.
  • May incur a slightly higher latency as H3 is relatively placed farther compared to the placement of H1 and H2.

Region Failure:

Region Failure - architecture

Impact:

  • Topology can withstand a region failure that hosts minority replicas.
  • The cluster is fully operational, as two-thirds of the fault domains are healthy.
  • RTO is internally balanced because failure causes leader elections in the surviving fault domains.
  • RTO ranges from 3 to a few seconds, depending on the number of tablets.

Region failure scenario 2
Impact:

  • At least two of the three fault domains must be healthy for the cluster to be healthy. This region hosts two-thirds of the replicas, so consistent reads and writes will be impacted.
  • The cluster state would be unhealthy and inoperable.
  • Need to fail over to the “secondary” cluster (depicted below), which will asynchronously receive data from the primary cluster. At this time, we do not provide auto-failover features at the driver level. Therefore, the application or application-ops team must handle it.
  • RTO is determined by the time required to transition from “primary” to “secondary”.
  • RPO is determined by the replication lag between the two cluster instances. Because of the async data replication between the two instances, this topology cannot guarantee a zero RPO.

Region Failure scenario II

  • Globally distributed with pinned leaders:

A database distributed across multiple fault domains between two regions/ data centers, with leaders and replicas pinned to reduce the RPO as much as possible during a disaster.

Regions/Data CentersMumbai, Delhi
Fault Domains*MumbaiH1
MumbaiH2  (isolated domain within the same region)
Delhi (H3a,3b)H3
Replication Factor(Primary)

Total: 3H11 replicaH32 replicasReplication Factor(Secondary)

Total: 3H2 (H2a,2b,2c)3 replicasTopologyGlobally distributed with pinned leaders

Globally distributed with pinned leaders

Key Points:

  • A single cluster is distributed across two regions (H1 and H3)./li>
  • Tablet leaders are pinned to the H1 fault domain nodes.
  • H1 keeps one-third of the replicas, while H3 keeps the other two-thirds.
  • H3 is logically divided into two fault domains, H3a and H3b, in order to retain two-thirds of the replica.
  • A secondary cluster is maintained in H2 and is located in the same region or data center as the primary cluster’s pinned leaders.
  • As tablet leaders are pinned to H1, secondary cluster’s async replication will be in near real time as it is co-located in the same region.
  • Writes may experience additional latency as the consensus must reach one of the fault domains in H3.
  • As tablet leaders are pinned to H1, which solely serves consistent reads and writes, we need to factor in additional resources in H1.

Region Failure:

Failure globally distributed with pinned leader
Impact:

  • Topology can withstand failure of the H1 region, which contains minority replicas.
  • RTO is internally balanced because failure causes leader elections in the surviving H3 fault domains.
  • RTO ranges from 3 to a few seconds, depending on the number of tablets.

Failure globally distributed pinned leader
Impact:

    • In order for the cluster to be healthy, at least two of the three fault domains must be healthy. This region hosts two-thirds of the replicas, so consistent reads and writes will be impacted.
    • Primary cluster will be unhealthy and inoperable.
    • Need to fail over to the “secondary” cluster, which will asynchronously receive data from the primary cluster. At this time, we do not provide auto-failover features at the driver level. Therefore, the application or application-ops team must handle it.
    • RTO is determined by the time required to transition from “primary” to “secondary”.
    • RPO is almost negligible as the async replication is near real-time and co-located in the same region.

Summary:

Operating a distributed database system in a two-data-center model presents two significant challenges: the inability to achieve an odd quorum in the event of a total regional outage, which may result in data loss or inconsistency during recovery. These challenges emphasize the importance of carefully considering the trade-offs and implementing appropriate failover mechanisms to ensure data consistency, availability, and fault tolerance in distributed database environments.

TopologyMumbai disruptionDelhi disruption
RTO
xCluster – UnidirectionalTime required to swing the traffic to secondaryNo impact
xCluster – BidirectionalNo impactNo impact
Globally distributedTime required to swing the traffic to secondaryNear zero *
Globally distributed with pinned leadersNear zero *Time required to swing the traffic to secondary
* ~3 to a few seconds for the tablet leaders to get balanced in the surviving FDs
RPO
xCluster – UnidirectionalReplica lag *No impact
xCluster – Bidirectional/td>Replica lag *Replica lag *
Globally distributedReplica lag *No impact
Globally distributed with pinned leadersNo impactNear zero **
*Determined by the replication lag between two regions.** Async replication is near real-time and co-located in the same region
TopologyParameters
Fault ToleranceOperational EfficiencyPerformanceTransactional Atomicity Guarantees
xCluster – UnidirectionalInfra Level
  • Operational involvement during a disaster.
  • Balanced
Guaranteed while failover to secondary
xCluster – BidirectionalRegion and Infra Level
  • Operational involvement during a disaster.
  • Application team has to manage record level isolation to avoid same row conflicts.
  • Balanced
Not guaranteed
Globally distributedRegion and Infra Level *
  • Operational involvement during a disaster
  • Balanced
  • Incur additional latency if any of the near-located fault domains fail<
Guaranteed while failover to secondary
Globally distributed with pinned leadersRegion and Infra Level *
  • Operational involvement is required during a disaster
  • Balanced<
  • Incur additional latency as the quorum consensus is always cross region.
Guaranteed while failover to secondary
*Withstand failure of minority replica region

Additional Resources:

Build global applications – Learn how to design globally distributed applications using simple patterns (YugabyteDB documentation) 

Synchronous replication using the Raft consensus protocol (YugabyteDB documentation) 

xCluster replication – Asynchronous replication between independent YugabyteDB universes (YugabyteDB documentation) 

eo-Distribution in YugabyteDB: Engineering Around the Physics of Latency

Ready to experience the power and simplicity of YugabyteDB for yourself?

Sign up for our free cloud DBaaS offering today at cloud.yugabyte.com. No credit card required, just pure database innovation waiting to be unleashed!

Ready to experience the power and simplicity of YugabyteDB for yourself?Sign up for our free cloud DBaaS offering today at cloud.yugabyte.com. No credit card required, just pure database innovation waiting to be unleashed!

Let’s get started!

Srinivasa Vasu

Related Posts

Explore Distributed SQL and YugabyteDB in Depth

Discover the future of data management.
Learn at Yugabyte University
Get Started
Browse Yugabyte Docs
Explore docs
PostgreSQL For Cloud Native World
Read for Free