Start Now

Distributed SQL Tips and Tricks – February 15, 2022

Marko Rajcevic

Welcome back to our tips and tricks blog! I have the pleasure of recapping distributed SQL questions from around the Internet.

This blog series would not be possible without Dorian Hoxha, Franck Pachot, and Frits Hoogland. We also thank our incredible user community for not being afraid to ask questions.

Do you have questions? Make sure to ask them on our YugabyteDB Slack channel, Forum, GitHub, or Stack Overflow. And lastly, for previous Tips & Tricks posts, check out our archives. Now let’s dive in!

Why are there multiple leaders in a YugabyteDB cluster?

In YugabyteDB, there are two main services: YB-TServer service and YB-Master service. More specifically, the former makes up the data layer and is responsible for all database-level operations. The latter makes up the control layer and is responsible for propagating cluster level operations, as well as storing database metadata—such as the PostgreSQL catalog—throughout the cluster. Above all, the tables and indexes in both services are split into tablets, the YugabyteDB terminology for shards. Each tablet is replicated based on the cluster level value set for the replication factor (RF).  With its replicants, these tablets make up what is referred to as a RAFT group. Therefore, each RAFT group has a single leader, and multiple followers dependent on that RF value. You can read more about the RAFT consensus protocol here.

Examining the YB-Master service

For example, let’s look at the YB-Master service. One of the YB-Master servers is randomly selected as the leader, and thus referred to as the master-leader, as appointed by the RAFT principle. More specifically, this service is responsible for maintaining YugabyteDB cluster consistency. Therefore, the remaining tablets—the master-followers—receive, apply, and acknowledge the master leader changes. As replicants, they also serve a purpose when it comes to high availability, if anything were to happen to the master-leader. All tables are at the YB-TServer level and will typically host more than one tablet-leader.

Although a cluster will only have X amount of YB-Master servers, based on its RF, every server will have a YB-TServer as it controls the actual data. As mentioned above, all tables are split into a certain amount of tablets, which is defined by the configuration flag ysql_num_shards_per_tserver for YSQL or yb_num_shards_per_tserver for YCQL. These flags can be overridden and explicitly set during the CREATE TABLE step by calling out the clause SPLIT INTO. Therefore, each of these tablets will be a RAFT group with a number of copies. And once split, each RAFT group will be equally distributed across the available nodes on the cluster. In sum, we have a single master-leader for the YB-Master service, and a tablet-leader for every tablet in the YB-TServer.

What happens to tablets (shards) when I lose a node and it is brought back into the cluster?

In scenarios where you have a running cluster and you lose a node, due to a networking partition, the following process will take place. But remember, in terms of the CAP theorem, YugabyteDB is a CP database. This means it will prioritize consistency over availability in the event of a network partition. However, this does not mean it is not highly available. With a replication factor of 3, your cluster will be able to tolerate losing a single node and still be able to serve all application traffic.

Continuous availability in YugabyteDB

When the node goes down, all leaders sitting on that node—whether a master-leader or a tablet-leader—will go through a 3 second re-election process. This process elects one of the followers to the leader role. During this time, there will be higher latencies for the tablet-groups going through the re-election process. The same goes for any YB-Master level operations if the master-leader happened to fall on that node. If you want to see how this stands with other database systems, check out this comparison against the 60s-120s failover window with Amazon Aurora. Continuous availability is one of YugabyteDB’s core design principles. This means a repaired node, once back online, will be caught up by the remaining nodes. Then the leaders will be redistributed equally across all three of the nodes.

By default, if a node is down for longer than 900 seconds (15 minutes), you will have to replace the node as the system will remove the data from the downed node. This duration—in seconds—after which a follower fails because the leader has not received a heartbeat is configurable. We recommend adding a new node to the quorum and removing the downed node if you expect the node to be down for a long period of time. The data replication to this newly-introduced node happens behind the scenes, with no manual steps required from the user. You can read more about how the RAFT consensus-based replication protocol works in YugabyteDB here.

Will my application lose connection to the cluster during a rolling restart, when I am upgrading or altering configuration flags?

If the connection is to the node that is going down, then: yes, it will affect the connection. More specifically, if you see any errors, this means the request failed and you should retry the transaction. One reason for performing a rolling restart across the cluster is to apply configuration flag settings that cannot be changed online. In short, we perform this often and it is a common practice for many of our clients.

New Documentation, Blogs, and Tutorials

Outside of the Yugabyte blogs called out below, you can also check out our Yugabyte DEV Community Blogs here.

New Videos

Upcoming Events and Training

Next Steps

Finally, ready to start exploring YugabyteDB features? Getting up and running locally on your laptop is fast. Visit our Quick Start page to get started.

Related Posts

Marko Rajcevic

Related Posts

Learn More to Accelerate Your Retail Business

Ready to dive deeper into distributed SQL, YugabyteDB, and Yugabyte Cloud?
Learn at Yugabyte University
Learn More
Browse Yugabyte Docs
Read More
Join the Yugabyte Community
Join Now