Distributed SQL Tips and Tricks – August 18th, 2021
Welcome back to our bi-weekly tips and tricks blog where I have the pleasure of recapping some distributed SQL questions from around the Internet. I had a month long hiatus this summer, so we are making it up by adding an extra entry into this week’s edition. As always, this blog series would not be possible without all of the behind the scenes work done by Yugabeings such as Dorian Hoxha and Frits Hoogland. All of us here at Yugabyte also thank our incredible user community for their work with the technology, and not being afraid to ask questions.
Do you have other questions? Make sure to ask them on our YugabyteDB Slack channel, Forum, GitHub, or Stack Overflow. For previous Tips & Tricks posts, check out our archives. Let’s dive in:
How to approach multi-tenancy in a distributed SQL system
As we move towards a more cloud native world, many companies are aiming to develop SaaS solutions as an alternative to their traditionally self-managed solutions. Along with any SaaS solutions comes the discussion on how to approach multi-tenancy. Below are a few strategies that we have explored, and the recommendations we have for each one. A shout out to Taylor Mull for this clear and concise explanation below.
Row level security – This feature enables database administrators to define a policy on a table such that it can control viewing and manipulation of data on a per user basis. A row level policy can be understood as an additional filter; when a user tries to perform an operation on a table, this filter is applied before any query condition or filtering, and data is shrunk down or access is denied based on the specific policy. In order to enable this, the key is to have a single schema shared across multiple users and then have the tenantID as part of the schema/primary key. Row level security is a good fit for these requirements:
- Large number of tenants
- Logical separation of tenant’s data is sufficient
- Prefers small number of database objects and simplified database management
- Shared schema across tenants
Multi schema/Multi DB – This is the default way of setting up multitenancy, allowing for physical separation of data but tends to lead to a higher number of tablets and overhead on the cluster. The two options for implementing this are to either use multiple databases within the same cluster or use multiple schemas. But in both cases the number of tablets scales linearly with the number of tenants, which leads to the high number of tablets. We previously covered the benefits of multi schema vs multi DB in a previous Tips & Tricks post. To summarize, use multi schema/multi DB when:
- Physical separation of data is required
- You have a small number of tenants or use of multiple clusters
Tablegroups (currently in beta) – This is a group of tables that will be placed on a single colocation tablet. A tablegroup be created at the schema level and multiple groups can be created based on specific data storage and access requirements. This allows for the physical separation of data but also reduces the number of tablets compared to option 2. Please be aware that this feature is currently not available but will be soon. To summarize, tablegroups are useful in these conditions:
- Large number of tenants
- Physical separation of data is required
Increasing max connections for YSQL
As you may know, a major design goal of YugabyteDB was to make it extremely easy to linearly scale the cluster to add more CPU or storage capacity. A less talked about benefit that results from this is the ability to scale the max connections to the database. This is typically important for users running traditional databases such as PostgreSQL who have had to work around this issue in the past. With YugabyteDB each node comes with a default of 300 maximum connections. Although this default can be increased using the flag
--ysql_max_connections, we have found that 300 is the sweet spot. However, if you require additional concurrent connections, we recommend adding additional nodes to the cluster. Each node will increase your max connections by 300. Read more about scaling connections in this post by our CTO and Co-Founder, Karthik Ranganathan, in which he does a detailed comparison between YugabyteDB, Aurora PostgreSQL, and CockroachDB.
Can I perform row level data recovery in YugabyteDB?
There are numerous types of data recovery one must be ready for in case unexpected issues arise. The most popular features that allow you to recover in these scenarios are from backups, snapshots, as well as using techniques such as point-in-time recovery (PITR). Features like backups are not just used in disaster recovery scenarios, they are often used to replicate production in development and testing environments. These are typical in most, if not every, database. However, with backups you are forced to recover to the state of the database at the time of the backup. This can cause you to lose data between the last backup and the time of the issue. To get around this, PITR allows you to restore the state of your cluster from a specific point in time. Every now and again, we get asked about the ability to roll back on a row level. Although we can do BAR operations including PITR at the database level, row level PITR would have to be achieved by implementing a versioning scheme in your table design.
Do writes in YugabyteDB rewrite the entire row, or only the columns that got changed?
New users of YugabyteDB typically have many questions regarding our distributed storage layer and how it handles topics ranging from persistence to performance. One question that comes up frequently, centers around how YugabyteDB handles writes when a column gets changed. As Dorian answers in this community slack thread, each column in YugabyteDB is stored as a separate rocksdb key-value underneath the covers. As a result, only changed columns will be rewritten. This is the case for both YSQL and YCQL tables, as each row is stored as a separate key-value in DocDB. Deeper insights into this layer can be found here.
New Documentation, Blogs, Tutorials, and Videos
- Announcing DSS ’21: Third Annual Distributed SQL Summit
- Using Sequelize with YugabyteDB
- Announcing YugabyteDB 2.6
- YugabyteDB Open Source Community Spotlight – July 2021
- Geo-Distribution in YugabyteDB: Engineering Around the Physics of Latency
- Connecting to YugabyteDB with Arctype, a Collaborative SQL Client
- Getting Started with the Kafka Connect for YugabyteDB (beta)
- Distributed SQL Summit Asia Recap: How Xignite Delivers Financial Data at Scale
- Catalysts of Change: Fireside Chat with Paul Gaffney of Kohl’s
- Building Cloud Native GraphQL Applications on Kubernetes with YugabyteDB and Hasura Open Source Software
- YugabyteDB YSQL Development Advanced Training and Certification
- Performance Tuning and Debugging GraphQL Applications with Yugabyte Cloud and Hasura Cloud
- Building Always-On, Scalable Applications with Yugabyte Cloud and Hasura Cloud
- August 24th, 2021 – Building Cloud Native Spring Boot Apps
- August 26th, 2021 – YugabyteDB Fundamentals Community Training and Certification
- August 31st, 2021 – Building Cloud Native Spring Boot Apps
- September 21st, 2021 – September 23rd, 2021 – DSS 2021
- September 30th, 2021 – Verteiltes Transaktionales SQL: Das Nächste Große Ding für Datenbanken
- September 30th, 2021 – Distributed SQL: The Next Big Thing in Transactional Databases
- September 30th, 2021 – YugabyteDB YSQL Development Fundamentals Certification and Training
Ready to start exploring YugabyteDB features? Getting up and running locally on your laptop is fast. Visit our quickstart page to get started.