What Happens to Tablets (Shards) When Node Is Lost and Then Brought Back Into Cluster?

Distributed SQL Tips and Tricks Series
Marko Rajcevic

In scenarios where you have a running cluster and you lose a node, due to, say, a networking partition, there is a process in place to handle this. But remember, in terms of the CAP theorem, YugabyteDB is a CP database. This means it will prioritize consistency over availability in the event of a network partition. However, this does not mean it is not highly available. With a replication factor of 3, your cluster will be able to tolerate losing a single node and still be able to serve all application traffic.

Continuous Availability in YugabyteDB

When the node goes down, all leaders sitting on that node—whether a master-leader or a tablet-leader—will go through a 3-second re-election process. This process elects one of the followers to the leader role. During this time, there will be higher latencies for any tablet-group going through the re-election process. The same goes for any YB-Master level operations if the master-leader happened to fall on that node.

* Continuous availability is one of YugabyteDB’s core design principles. This means a repaired node, once back online, will be caught up by the remaining nodes. Then the leaders will be redistributed equally across all the nodes.

* If you want to see how this stands with other database systems, check out this comparison against the 60s-120s failover window with Amazon Aurora.

When a Node is Down for Longer Period of Time

By default, if a node is down for longer than 900 seconds (15 minutes), you will have to replace the node since the system will remove the data from the downed node. This duration after which a follower will fail because the leader has not received a heartbeat is configurable (in seconds). We recommend adding a new node to the quorum and removing the downed node if you expect the node to be down for a long period of time. The data replication to this newly introduced node happens behind the scenes, with no manual steps required from the user.

Learn more about how the Raft Consensus-Based Replication Protocol Work in YugabyteDB >>>

Discover More Tips and Tricks

Explore our library of distributed SQL tips and tricks and general “how to” information on the YugabyteDB blog and on our DEV Community Blogs.

Events and Training

Check out the upcoming YugabyteDB events, including all training sessions, conferences, in-person and virtual events, and YugabyteDB Friday Tech Talks (designed for engineers by engineers).

In addition, there is some extremely popular “how to” content on the YugabyteDB YouTube channel.

If You Have Questions About YugabyteDB and Distributed SQL

If you have questions, ask them on the YugabyteDB Slack channel, Forum, GitHub, or Stack Overflow.

Ready To Start Exploring YugabyteDB Features?

You have some great options to get started. Run the database locally on your laptop (Quick Start), deploy it to your favorite cloud provider (Multi-node Cluster Deployment), sign up for a free YugabyteDB Managed cluster, or request a full-featured trial. It’s easy! Get started today!

Marko Rajcevic

Related Posts

Explore Distributed SQL and YugabyteDB in Depth

Discover the future of data management.
Learn at Yugabyte University
Get Started
Browse Yugabyte Docs
Explore docs
PostgreSQL For Cloud Native World
Read for Free