What Happens to Tablets (Shards) When Node Is Lost and Then Brought Back Into Cluster?
In scenarios where you have a running cluster and you lose a node, due to, say, a networking partition, there is a process in place to handle this. But remember, in terms of the CAP theorem, YugabyteDB is a CP database. This means it will prioritize consistency over availability in the event of a network partition. However, this does not mean it is not highly available. With a replication factor of 3, your cluster will be able to tolerate losing a single node and still be able to serve all application traffic.
When the node goes down, all leaders sitting on that node—whether a master-leader or a tablet-leader—will go through a 3-second re-election process. This process elects one of the followers to the leader role. During this time, there will be higher latencies for any tablet-group going through the re-election process. The same goes for any YB-Master level operations if the master-leader happened to fall on that node.
* Continuous availability is one of YugabyteDB’s core design principles. This means a repaired node, once back online, will be caught up by the remaining nodes. Then the leaders will be redistributed equally across all the nodes.
* If you want to see how this stands with other database systems, check out this comparison against the 60s-120s failover window with Amazon Aurora.
By default, if a node is down for longer than 900 seconds (15 minutes), you will have to replace the node since the system will remove the data from the downed node. This duration after which a follower will fail because the leader has not received a heartbeat is configurable (in seconds). We recommend adding a new node to the quorum and removing the downed node if you expect the node to be down for a long period of time. The data replication to this newly introduced node happens behind the scenes, with no manual steps required from the user.
Explore our library of distributed SQL tips and tricks and general “how to” information on the YugabyteDB blog and on our DEV Community Blogs.
Check out the upcoming YugabyteDB events, including all training sessions, conferences, in-person and virtual events, and YugabyteDB Friday Tech Talks (designed for engineers by engineers).
In addition, there is some extremely popular “how to” content on the YugabyteDB YouTube channel.
You have some great options to get started. Run the database locally on your laptop (Quick Start), deploy it to your favorite cloud provider (Multi-node Cluster Deployment), sign up for a free YugabyteDB Managed cluster, or request a full-featured trial. It’s easy! Get started today!