Start Now

Distributed SQL Tips and Tricks – January 14th, 2022

Marko Rajcevic

Welcome back to our tips and tricks blog! I have the pleasure of recapping distributed SQL questions from around the Internet.

This blog series would not be possible without Dorian Hoxha, Franck Pachot, and  Frits Hoogland. We also thank our incredible user community for not being afraid to ask questions.

Do you have questions? Make sure to ask them on our YugabyteDB Slack channel, Forum, GitHub, or Stack Overflow. For previous Tips & Tricks posts, check out our archives. Now let’s dive in!

What happens to a downed node when it comes back into the cluster?

There are a number of reasons a node could go down, but this type of event primarily falls into two categories: planned and unplanned downtime. YugabyteDB’s highly available nature, which can be read about here and here, provides stability during this time. Since we replicate your data X amount of times, you have a level of fault tolerance that protects you from a node being down.

When a node goes down, a re-election begins for the remaining tablet-peers in the RAFT group whose leader was on the downed node. The physical copy of the data still remains on that downed node. As such, when a node comes back up, the data is typically in a different state than the remaining nodes. While it is down this node remains a part of the RAFT group for 15 minutes.

What happens during the 15-minute re-election period?

15 minutes is the default time and configurable before the tablet—as a RAFT member—is kicked out and its data deleted. If the downed node comes back up before the 15 minute RAFT timeout, the remaining nodes in the quorum make sure to replicate any data changes to the tablets that have occurred while the node has been down. This is the typical process when going through a rolling restart for updates and configuration flag changes.

If the node is down for longer than 15 minutes, or the value you configured it to, the copies of the data will replicate to the remaining nodes, and the downed node will be kicked out of the quorum. This is more likely if you experience unplanned downtime. In this case, you will want to stand up a new node and get it added to the quorum to replace the one you lost. You should do this immediately if you expect the downed node to be out a while as your system no longer has any flexibility for fault tolerance.

For example, if you have a 3-node cluster with a replication factor of 3, you can only sustain the loss of 1 node without impacting full functionality of the cluster. To learn more about increasing your node failure threshold, read our blog on Fine-Grained Control for High Availability. You can also read more about how the RAFT consensus-based replication protocol works in YugabyteDB.

Can we set different RF values and server counts for the read replica cluster?

Read replicas asynchronously replicate data from your primary cluster to a remote region in order to serve low latency reads to users in those areas. As described in the documentation linked above, read replicas are observer nodes that do not participate in writes, but get a timeline-consistent copy of the data through asynchronous replication from the primary cluster. They do not guarantee strong consistency and ACID compliance. As such, stale reads are possible with an upper bound on the amount of staleness.

Your primary cluster along with any read replicas make up what we call a Universe in YugabyteDB. Read replicas can have any number of servers, as their server count and replication factor is completely separate from your primary cluster. For example, if you have your primary cluster in the US, you can extend to a single region in the EU with a single read replica, or you can have multiple read replica nodes in different regions. Since read replicas have their own replication factor, if you want high availability for that replica you would want to have a higher node count and replication factor than 1.

Why doesn’t YugabyteDB support certain extensions like pg_repack and pg_buffercache?

Certain PostgreSQL extensions such as pg_repack and pg_buffercache deal with the PostgreSQL storage layer directly. Although YugabyteDB reuses the Postgres query layer, we took a completely different approach to the storage layer. Due to this certain extensions, such as the two mentioned above, do not apply.

pg_buffercache is an extension that allows visibility to the buffer cache, which with Postgres sits in shared memory. And while defining this buffer cache with YSQL, it is not actively used as the caching takes place on the tserver. The tserver process has exposed metrics and visibility.

pg_repack is a PostgreSQL extension that lets you remove bloat from tables and indexes. Since YugabyteDB uses a LSM-tree-based storage engine, compactions remove any expired data, so there should be no need to use this extension on YugabyteDB. You can take a look at what extensions are available here, as well as those that come pre-bundled here.

Learn more about our architecture here.

New documentation, blogs, and tutorials

Outside of the Yugabyte blogs called out below, you can also check out our Yugabyte DEV Community Blogs here.

New videos

Upcoming events and training

Next steps

Ready to start exploring YugabyteDB features? Getting up and running locally on your laptop is fast. Visit our Quick Start page to get started.

Related Posts

Marko Rajcevic

Related Posts

Learn More to Accelerate Your Retail Business

Ready to dive deeper into distributed SQL, YugabyteDB, and Yugabyte Cloud?
Learn at Yugabyte University
Learn More
Browse Yugabyte Docs
Read More
Join the Yugabyte Community
Join Now