Distributed SQL Tips and Tricks – February 15, 2022
Welcome back to our tips and tricks blog series! I have the pleasure of recapping distributed SQL questions from around the Internet.
This blog series would not be possible without Dorian Hoxha, Franck Pachot, and Frits Hoogland. We also thank our incredible user community for not being afraid to ask questions.
Do you have questions? Make sure to ask them on our YugabyteDB Slack channel, Forum, GitHub, or Stack Overflow. And lastly, for previous Tips & Tricks posts, check out our archives. Now let’s dive in!
Why are there multiple leaders in a YugabyteDB cluster?
In YugabyteDB, there are two main services: YB-TServer service and YB-Master service. More specifically, the former makes up the data layer and is responsible for all database-level operations. The latter makes up the control layer and is responsible for propagating cluster level operations, as well as storing database metadata—such as the PostgreSQL catalog—throughout the cluster. Above all, the tables and indexes in both services are split into tablets, the YugabyteDB terminology for shards. Each tablet is replicated based on the cluster level value set for the replication factor (RF). With its replicants, these tablets make up what is referred to as a RAFT group. Therefore, each RAFT group has a single leader, and multiple followers dependent on that RF value. You can read more about the RAFT consensus protocol here.
Examining the YB-Master service
For example, let’s look at the YB-Master service. One of the YB-Master servers is randomly selected as the leader, and thus referred to as the master-leader, as appointed by the RAFT principle. More specifically, this service is responsible for maintaining YugabyteDB cluster consistency. Therefore, the remaining tablets—the master-followers—receive, apply, and acknowledge the master leader changes. As replicants, they also serve a purpose when it comes to high availability, if anything were to happen to the master-leader. All tables are at the YB-TServer level and will typically host more than one tablet-leader.
Although a cluster will only have X amount of YB-Master servers, based on its RF, every server will have a YB-TServer as it controls the actual data. As mentioned above, all tables are split into a certain amount of tablets, which is defined by the configuration flag ysql_num_shards_per_tserver
for YSQL or yb_num_shards_per_tserver
for YCQL. These flags can be overridden and explicitly set during the CREATE TABLE
step by calling out the clause SPLIT INTO
. Therefore, each of these tablets will be a RAFT group with a number of copies. And once split, each RAFT group will be equally distributed across the available nodes on the cluster. In sum, we have a single master-leader for the YB-Master service, and a tablet-leader for every tablet in the YB-TServer.
What happens to tablets (shards) when I lose a node and it is brought back into the cluster?
In scenarios where you have a running cluster and you lose a node, due to a networking partition, the following process will take place. But remember, in terms of the CAP theorem, YugabyteDB is a CP database. This means it will prioritize consistency over availability in the event of a network partition. However, this does not mean it is not highly available. With a replication factor of 3, your cluster will be able to tolerate losing a single node and still be able to serve all application traffic.
Continuous availability in YugabyteDB
When the node goes down, all leaders sitting on that node—whether a master-leader or a tablet-leader—will go through a 3 second re-election process. This process elects one of the followers to the leader role. During this time, there will be higher latencies for the tablet-groups going through the re-election process. The same goes for any YB-Master level operations if the master-leader happened to fall on that node. If you want to see how this stands with other database systems, check out this comparison against the 60s-120s failover window with Amazon Aurora. Continuous availability is one of YugabyteDB’s core design principles. This means a repaired node, once back online, will be caught up by the remaining nodes. Then the leaders will be redistributed equally across all three of the nodes.
By default, if a node is down for longer than 900 seconds (15 minutes), you will have to replace the node as the system will remove the data from the downed node. This duration—in seconds—after which a follower fails because the leader has not received a heartbeat is configurable. We recommend adding a new node to the quorum and removing the downed node if you expect the node to be down for a long period of time. The data replication to this newly-introduced node happens behind the scenes, with no manual steps required from the user. You can read more about how the RAFT consensus-based replication protocol works in YugabyteDB here.
Will my application lose connection to the cluster during a rolling restart, when I am upgrading or altering configuration flags?
If the connection is to the node that is going down, then: yes, it will affect the connection. More specifically, if you see any errors, this means the request failed and you should retry the transaction. One reason for performing a rolling restart across the cluster is to apply configuration flag settings that cannot be changed online. In short, we perform this often and it is a common practice for many of our clients.
New Documentation, Blogs, and Tutorials
Outside of the Yugabyte blogs called out below, you can also check out our Yugabyte DEV Community Blogs here.
- TPC-C Benchmark: Scaling YugabyteDB to 100,000 Warehouses
- YugabyteDB and Apache Superset: Explore and Visualize Open Source Data
- Linux Performance Tuning: Dealing with Memory and Disk IO
- Tutorial: How to Deploy Multi-Region YugabyteDB on GKE Using Multi-Cluster Services
- YugabyteDB Migration: What About Those 19 Oracle Features I Thought I Would Miss?
- My Yugabyte Journey into Distributed SQL Databases
- YugabyteDB Savepoints: Checkpointing Work in Distributed Transactions
- YugabyteDB Integrates with Arctype SQL Client
- PostgreSQL Timestamps and Timezones: What You Need to Know—and What You Don’t
- PostgreSQL Timestamps and Timezones: How to Navigate the Interval Minefield
- My Yugabyte Journey: From Intern to Full-Time Software Engineer
- A Matter of Time: Evolving Clock Sync for Distributed Databases
- Securing YugabyteDB: Server-to-Server Encryption in Transit
New Videos
- YugabyteDB Friday Tech Talks: Episode 6: Multi-Region Deployment Options
- YugabyteDB Friday Tech Talks: Episode 5: PostgreSQL-Compatible “Smart” Drivers
- Is 2022 “The Year of the Edge?”
- YugabyteDB Friday Tech Talks: Episode 4: Continuous Availability with YugabyteDB
- YugabyteDB Friday Tech Talks: Episode 3: YugabyteDB Sharding Strategies
- A Tale of Two Distributed Systems: Kubernetes and YugabyteDB
- YugabyteDB Friday Tech Talks: Episode 2: YugabyteDB’s SQL Layer
Upcoming Events and Training
- YugabyteDB YCQL Development Fundamentals – February 16, 9:30am – 11:00am (IST) • Training • Online
- Yugabyte Cloud Office Hours – February 16, 2022 – February 16, 8:00am – 9:00am (PT) • Webinar • Online
- YugabyteDB Friday Tech Talks: YSQL Follower Reads – February 18, 9:30am – 10:00am (PT) • Webinar • Online
- Build a Real-time Polling App with Hasura GraphQL and Yugabyte Cloud – February 24, 9:30am – 11:00am (IST) • Training • Online
- Continuous Intelligence Day – February 23 – 24 • Conference • Online
- Do My Microservices Need a Heterogeneous Data Layer? – February 24, 1:00pm – 2:00pm (CET) • Webinar • Online
- YugabyteDB YSQL Development Advanced – March 01, 9:00am – 10:30am (PT) • Training • Online
- From Postgres and MongoDB to the Yuga of Cloud Native Databases – March 03, 10:00am – 11:00am (IST) • Webinar • Online
- Cloud Expo Europe, London 2022 – March 02 – 03 • Conference • London, England
- Distributed SQL Summit London – March 8 • Conference • London, England
- Distributed SQL Summit Asia – March 30 – 31 • Conference • Online
- Postgres Conference Silicon Valley 2022 – April 07 – 08 • Conference • San Jose, CA, USA
- CIO Visions Summit, Orlando – April 10 – 12 • Conference • Orlando, FL, USA
- CIO Visions Summit, North America – April 18 – 22 • Conference • Online
- KubeCon + CloudNativeCon Europe 2022 – May 16 – 20 • Conference • Online
- Cloud Expo Asia, Singapore – October 12 – 13 • Conference • Singapore
- KubeCon + CloudNativeCon North America 2022 – October 24 – 28 • Conference • Online
- CIO Visions Summit, Chicago – November 06 – 08 • Conference • Chicago, IL, USA
Next Steps
Finally, ready to start exploring YugabyteDB features? Getting up and running locally on your laptop is fast. Visit our Quick Start page to get started.