Start Now

Distributed SQL Tips and Tricks – March 21, 2022

Marko Rajcevic

Welcome back to our distributed SQL tips and tricks blog! I have the pleasure of recapping distributed SQL questions from around the Internet.

This blog series would not be possible without Dorian Hoxha, Franck Pachot, and  Frits Hoogland. We also thank our incredible user community for not being afraid to ask questions.

Do you have questions? Make sure to ask them on our YugabyteDB Slack channel, Forum, GitHub, or Stack Overflow. For previous Tips & Tricks posts, check out our archives. Now let’s dive in.

How is data corruption handled in YugabyteDB compared to PostgreSQL?

Fundamentally, both PostgreSQL and YugabyteDB have their pros and cons as the storage engines are extremely different. With YugabyteDB you will get the consistency of PostgreSQL, but with much stronger resiliency. We’re resilient to all manner of issues such as server panics, node failures, and disk corruption. The database as a unit is tolerant of a single disk failure or corruption because it’s running on multiple servers. If needed, you can take down a bad server and bring on a new one, all while serving active traffic. A large automotive manufacturer was recently able to keep serving 2m ops/s despite a server panic.

Regarding disk corruption specifically, the first major difference with PostgreSQL is that we use LSM Tree / SST Files, which are sequential write append only. The random updates happen in memory only (the first level of LSM Tree is a MemTable) which flushes to a SST file, and SST Files compact into new SST files. This reduces the risk of corruption and guarantees that new writes cannot corrupt previous data. In PostgreSQL, Heap Tables and B-Tree update with random reads which can go to any block. In this case if a corruption occurs at the storage level, it may corrupt previous data because it is in the same block. Because of this there is a need to check the whole database frequently. If corruption occurs on a block at a time older than the backup retention, there’s no possibility to get it back.

Dealing with managed services

If you are using managed services for your Postgres—such as RDS—there is a chance that the same corrupt block is on the standby due to the way RDS replicates with storage sync rather than WAL. With YugabyteDB the SST files, once written, are not altered. As a result, new changes cannot corrupt the past data. Additionally, YugabyteDB checks data validity as a part of compactions, which get triggered as a part of new writes to the database. All nodes perform compactions independently. That gives them all a chance to verify data integrity as they read the data by verifying checksums. As YugabyteDB’s replication is at a higher layer (the logical key-value changes in the raft group), another tablet peer has the right data (the probability to have corruption on two different physical writes is very low) and the corrupt one can be discarded.

What is the best way to control the total number of shards in my YugabyteDB cluster ?

You can control the number of shards—or tablets in YugabyteDB—using multiple methods depending on the level of control you want. The flag ysql_num_tablets controls the number of tablets per YSQL table. Default value is -1. If its value is not set then the value defaults to ysql_num_shards_per_tserver. The difference between the two is that the former controls the total number of tablets for the table, whereas the latter controls the number of tablets per tserver. For example:

ysql_num_tablets=1

will store the table into a single tablet, whereas if the cluster consists of 3 nodes, where each has a tserver, and:

ysql_num_shards_per_tserver=1

then the table will be split into 3 tablets since there are 3 tservers. So, in this case, the number of tservers determine the tablet count.

If the user uses the SPLIT INTO syntax during the CREATE TABLE DDL statement then it overrides the values of the flags specified above. If required after the fact you can always manually split any existing tablets as well. Additionally, YugabyteDB supports dynamic auto-splitting which handles this for you as your data grows. This feature can be tracked via Github here.

Remember, since indexes are additional tables in YugabyteDB this applies to them as well. For YCQL, the syntax is different and YCQL specific, however, the same process applies. You can see this directly in the code called out here. Such is the beauty of open-source.

Is there a way to enable low latency reads for my cluster without adding servers via read replicas ?

There are multiple ways to extend your reads to regions local to your users with YugabyteDB. As you call out above, read replicas are one of them, however this requires you to spin up additional infrastructure. Adding servers means paying more money and many companies do not have this flexibility. In such cases, users can enable follower reads to allow for low-latency reads from the primary cluster. As we call out in our docs:

By default, YugabyteDB requires reading from the leader to read the latest data. However, for applications that don’t require the latest data, or are working with unchanging data, the cost of contacting a potentially remote leader to fetch the data may be wasteful. Your application may benefit from better latency by reading from a replica that is closer to the client.

Although read replicas are typically used by monolithic applications to accomplish something similar, they require additional infrastructure and use asynchronous replication. Follower reads execute on the primary cluster. Since data in this cluster replicates synchronously, it allows you to read the right value as soon as the data is written to disk. However, please keep in mind that because data changes are still replicated from the leader, there is a chance of stale reads. Similar to the use of read replicas, this will not work for applications that require absolute correctness of data. Setup and examples of follower reads can be found here.

New documentation, blogs, and tutorials

Outside of the Yugabyte blogs called out below, you can also check out our Yugabyte DEV Community Blogs here.

New videos

Upcoming events and training

Next steps

Ready to start exploring YugabyteDB features?

You have great options: run locally on your laptop (Quick Start), deploy to your favorite cloud provider (Multi-node Cluster Deployment), or sign up for a free Yugabyte Cloud cluster. It’s easy! Start today! 

Related Posts

Marko Rajcevic

Related Posts

Get started in any cloud, container or data center