Distributed SQL Tips and Tricks – August 18th, 2021

August 18, 2021

Welcome back to our bi-weekly tips and tricks blog where I have the pleasure of recapping some distributed SQL questions from around the Internet. I had a month long hiatus this summer, so we are making it up by adding an extra entry into this week’s edition. As always, this blog series would not be possible without all of the behind the scenes work done by Yugabeings such as Dorian Hoxha and Frits Hoogland. All of us here at Yugabyte also thank our incredible user community for their work with the technology, and not being afraid to ask questions.

Do you have other questions? Make sure to ask them on our YugabyteDB Slack channel, Forum, GitHub, or Stack Overflow. For previous Tips & Tricks posts, check out our archives. Let’s dive in:

How to approach multi-tenancy in a distributed SQL system

As we move towards a more cloud native world, many companies are aiming to develop SaaS solutions as an alternative to their traditionally self-managed solutions. Along with any SaaS solutions comes the discussion on how to approach multi-tenancy. Below are a few strategies that we have explored, and the recommendations we have for each one. A shout out to Taylor Mull for this clear and concise explanation below.

Row level security – This feature enables database administrators to define a policy on a table such that it can control viewing and manipulation of data on a per user basis. A row level policy can be understood as an additional filter; when a user tries to perform an operation on a table, this filter is applied before any query condition or filtering, and data is shrunk down or access is denied based on the specific policy. In order to enable this, the key is to have a single schema shared across multiple users and then have the tenantID as part of the schema/primary key. Row level security is a good fit for these requirements:

Large number of tenants
Logical separation of tenant’s data is sufficient
Prefers small number of database objects and simplified database management
Shared schema across tenants

Multi schema/Multi DB – This is the default way of setting up multitenancy, allowing for physical separation of data but tends to lead to a higher number of tablets and overhead on the cluster. The two options for implementing this are to either use multiple databases within the same cluster or use multiple schemas. But in both cases the number of tablets scales linearly with the number of tenants, which leads to the high number of tablets. We previously covered the benefits of multi schema vs multi DB in a previous Tips & Tricks post. To summarize, use multi schema/multi DB when:

Physical separation of data is required
You have a small number of tenants or use of multiple clusters

Tablegroups (currently in beta) – This is a group of tables that will be placed on a single colocation tablet. A tablegroup be created at the schema level and multiple groups can be created based on specific data storage and access requirements. This allows for the physical separation of data but also reduces the number of tablets compared to option 2. Please be aware that this feature is currently not available but will be soon. To summarize, tablegroups are useful in these conditions:

Large number of tenants
Physical separation of data is required

Increasing max connections for YSQL

As you may know, a major design goal of YugabyteDB was to make it extremely easy to linearly scale the cluster to add more CPU or storage capacity. A less talked about benefit that results from this is the ability to scale the max connections to the database. This is typically important for users running traditional databases such as PostgreSQL who have had to work around this issue in the past. With YugabyteDB each node comes with a default of 300 maximum connections. Although this default can be increased using the flag --ysql_max_connections, we have found that 300 is the sweet spot. However, if you require additional concurrent connections, we recommend adding additional nodes to the cluster. Each node will increase your max connections by 300. Read more about scaling connections in this post by our CTO and Co-Founder, Karthik Ranganathan, in which he does a detailed comparison between YugabyteDB, Aurora PostgreSQL, and CockroachDB.

Can I perform row level data recovery in YugabyteDB?

There are numerous types of data recovery one must be ready for in case unexpected issues arise. The most popular features that allow you to recover in these scenarios are from backups, snapshots, as well as using techniques such as point-in-time recovery (PITR). Features like backups are not just used in disaster recovery scenarios, they are often used to replicate production in development and testing environments. These are typical in most, if not every, database. However, with backups you are forced to recover to the state of the database at the time of the backup. This can cause you to lose data between the last backup and the time of the issue. To get around this, PITR allows you to restore the state of your cluster from a specific point in time. Every now and again, we get asked about the ability to roll back on a row level. Although we can do BAR operations including PITR at the database level, row level PITR would have to be achieved by implementing a versioning scheme in your table design.

Do writes in YugabyteDB rewrite the entire row, or only the columns that got changed?

New users of YugabyteDB typically have many questions regarding our distributed storage layer and how it handles topics ranging from persistence to performance. One question that comes up frequently, centers around how YugabyteDB handles writes when a column gets changed. As Dorian answers in this community slack thread, each column in YugabyteDB is stored as a separate rocksdb key-value underneath the covers. As a result, only changed columns will be rewritten. This is the case for both YSQL and YCQL tables, as each row is stored as a separate key-value in DocDB. Deeper insights into this layer can be found here.