“YugabyteDB Managed” is now called “YugabyteDB Aeon”. To find out more, visit our launch blog.

What Happens to Tablets (Shards) When Node Is Lost and Then Brought Back Into Cluster?

Distributed SQL Tips and Tricks Series

September 22, 2023

In scenarios where you have a running cluster and you lose a node, due to, say, a networking partition, there is a process in place to handle this. But remember, in terms of the CAP theorem, YugabyteDB is a CP database. This means it will prioritize consistency over availability in the event of a network partition. However, this does not mean it is not highly available. With a replication factor of 3, your cluster will be able to tolerate losing a single node and still be able to serve all application traffic.

Continuous Availability in YugabyteDB

When the node goes down, all leaders sitting on that node—whether a master-leader or a tablet-leader—will go through a 3-second re-election process. This process elects one of the followers to the leader role. During this time, there will be higher latencies for any tablet-group going through the re-election process. The same goes for any YB-Master level operations if the master-leader happened to fall on that node.

* Continuous availability is one of YugabyteDB’s core design principles. This means a repaired node, once back online, will be caught up by the remaining nodes. Then the leaders will be redistributed equally across all the nodes.

* If you want to see how this stands with other database systems, check out this comparison against the 60s-120s failover window with Amazon Aurora.

When a Node is Down for Longer Period of Time

By default, if a node is down for longer than 900 seconds (15 minutes), you will have to replace the node since the system will remove the data from the downed node. This duration after which a follower will fail because the leader has not received a heartbeat is configurable (in seconds). We recommend adding a new node to the quorum and removing the downed node if you expect the node to be down for a long period of time. The data replication to this newly introduced node happens behind the scenes, with no manual steps required from the user.

Learn more about how the Raft Consensus-Based Replication Protocol Work in YugabyteDB >>>

You have some great options to get started. Run the database locally on your laptop (Quick Start), deploy it to your favorite cloud provider (Multi-node Cluster Deployment), sign up for a free YugabyteDB Managed cluster, or request a full-featured trial. It’s easy! Get started today!

September 22, 2023

What Happens to Tablets (Shards) When Node Is Lost and Then Brought Back Into Cluster?

Continuous Availability in YugabyteDB

When a Node is Down for Longer Period of Time

Discover More Tips and Tricks

Events and Training

If You Have Questions About YugabyteDB and Distributed SQL

Ready To Start Exploring YugabyteDB Features?

Explore Distributed SQL and YugabyteDB in Depth

What Happens to Tablets (Shards) When Node Is Lost and Then Brought Back Into Cluster?

Related Posts

Explore Distributed SQL and YugabyteDB in Depth