Q&A: Tablet Splitting, High Availability Behavior, & Sharding vs Partitioning with YugabyteDB
This month I focus on recommended values for tablet splitting, behavior around high availability, client behavior when a node goes down, and sharding versus partitioning of a table.
This question will be less important with automatic tablet splitting set as the default. However, it is still essential you understand the ‘why’ behind the choices.
“…the resharding of data in the cluster by presplitting tables before data is added or by changing the number of tablets at runtime.”
There are different ways to set the number of tablets, or shards, on your YugabyteDB cluster. (You can read about this in our March tips and trick blog). Although it is hard to gauge the exact number of tablets, since it depends on your table size and activity, here are some high-level suggestions:
To keep your total tablet count low, you should create a colocated tablet for your smaller tables, while only keeping your larger ones sharded across the cluster.
In the case of a colocated tablet, the tablet is replicated based on your Replication Factor (RF). However, there is a single tablet leader. Remember that having those smaller tables colocated may or may not help depending on the amount of activity on those tables, so you still want to test this.
If you choose not to colocate your smaller tables, you can split each table into a single tablet by setting SPLIT INTO 1 TABLETS during table creation, or by enabling auto-splitting. Auto-splitting will split each small table into a single tablet as long as it is under the threshold of tablet_split_low_phase_size_threshold_byte. This is set to 512 MB by default.
A good place to start for medium-sized tables, whether you want to enable auto-splitting or not, would be 8 tablets per tserver. This would be 24 total leader tablets in a 3 node 3 RF cluster.
The goal here is to keep each tablet under 10GB. If you anticipate this table will grow consistently, we recommend enabling auto-splitting, which allows YugabyteDB to automatically split your table into additional tablets as it grows.
Typically the larger your table, the more tablets you want to break it into, to avoid hotspots and ensure equal resource usage across the nodes in your cluster.
Keep in mind that you want to keep the number of tablets across your cluster as low as possible, although it is natural for this count to grow as your tables increase in size.
You can pre-split your large tables – which could be anywhere from 8 to 24 tablets per tserver depending on your instance size – to avoid too many splits during bulk loading. However, by enabling auto-splitting, you ensure that after the initial load, YugabyteDB will auto-split from that point on as your data size increases.
For a more detailed explanation of sharding and the auto-sharding mechanics in YugabyteDB, check out Distributed SQL Sharding: How Many Tablets, and at What Size?
P.S. Keep in mind that indexes are sharded in the same way as tables.
If a Node Becomes Unavailable, Does YugabyteDB Propagate Client Requests to Surviving Nodes, or do the Clients Have to Handle That Retry Logic?
The answer depends on a few characteristics:
- Are the clients connected directly to the node, or through a load balancer?
- Are you using the PostgreSQL driver, or the YugabyteDB smart driver?
- Are you using connection pooling?
If you are connected directly to the node, then the physical connections to that node will be invalidated. Your connection pool will automatically adjust the minimum pool requirements for the other nodes in the cluster.
If required, it creates new connections to the surviving nodes, which will be transparently managed. If the node goes down abruptly in the middle of inflight transactions, those transactions will fail, and clients can retry. Or, you can set the retry logic on the application layer, based on the error code. Again, in this case, your connection pool should invalidate already established physical connections to the failed node.
From a connection string perspective, it is better to pass more than a single endpoint (at least 2/3 endpoints) so that net new client connections get established based on the remaining available endpoints. If you keep only one endpoint, there will not be any impact on the existing connections, but you can’t create net new client connections if that single endpoint is down.
An example of this recommendation using the YugabyteDB smart driver is:
NOTE: If using multi-region clusters, you want to have at least one per endpoint per region.
Yes, regardless if you partition it or not, every table and index in YugabyteDB will be sharded across your cluster. Think of every partition as a single table, because that is how YSQL looks at it.
Partitioning and sharding are separate concepts in YugabyteDB that can be used together to configure unique concepts such as row-level geo-partitioning for multi-region workloads. As described in Distributed SQL Essentials: Sharding and Partitioning in YugabyteDB, partitioning is at the query level (YSQL), while sharding is at the storage level (DocDB).
If you want to prove this, follow an example in our partitioning documentation. Once the partitions are created, go to http://<ip>:9000 and click on the Tablets tab to see all of the tablets created.
You can also use the following yb-admin command based on the example in the docs:
yb-admin --master_addresses <ip1>:7100,<ip2>:7100,<ip3>:7100 list_tablets ysql.yugabyte order_changes_2019_02
More tips and tricks, and general information can be found by searching the YugabyteDB Blog, which is updated regularly. You can also check out our DEV Community Blogs. New blogs include:
- Create Alert Configurations For YugabyteDB Anywhere Using APIs
- Comparing the Maximum Availability of YugabyteDB and Oracle Database
- Reference Architecture for Deploying YugabyteDB with VMware Tanzu
- Data Streaming Using YugabyteDB CDC, Kafka, and SnowflakeSinkConnector
- Using MinIO for YugabyteDB Backup and Restore
- Unlocking Azure Storage Options With YugabyteDB CDC
- Optimizing YugabyteDB Memory Tuning for YSQL
All of our recent Distributed SQL Summit (DSS) 2022 sessions are now available to view on-demand. If you missed the online DSS event, or a specific session, simply register and watch.
You can also check out our newest video content on the Yugabyte YouTube channel.
- “Triggers Considered Harmful” Considered Harmful | YugabyteDB Friday Tech Talks | Episode 35
- RBAC Security Features in YugabyteDB | YugabyteDB Friday Tech Talks | Episode 34
- Postgres TABLESPACE in a Cloud Native World | YugabyteDB Friday Tech Talks | Episode 33
- Packed Row Storage Format in YugabyteDB | YugabyteDB Friday Tech Talks | Episode 32
- yb_stats in YugabyteDB | YugabyteDB Friday Tech Talks | Episode 31
- Using Skyvia With YugabyteDB Managed
- Encryption at Rest in YugabyteDB | YugabyteDB Friday Tech Talks | Episode 30
Check out the full list of upcoming YugabyteDB events, including training sessions, conferences, and additional in-person and virtual events:
- DSS Days – In-person events
- Lunch ‘n’ Learn: Introduction to Yugabyte (October 4)
- Cloud Expo Asia, Singapore
- KubeCon + CloudNativeCon North America 2022
- PostgreSQL Conference Europe
- Lunch ‘n’ Learn: Introduction to Yugabyte (November 2)
- CIO Visions Summit, Chicago
- Between the DevOps and the DBA – How Do We Make It a Win-Win?
This tips and tricks blog series would not be possible without the support of fellow Yugabeings like Denis Magda, Dorian Hoxha, Franck Pachot, and Frits Hoogland (to name a few). Thanks also to our incredible user community for not being afraid to ask questions.
For previous Tips & Tricks posts, check out our archives.
Ready to start exploring YugabyteDB features?
You have some great options:
- Run the database locally on your laptop (Quick Start),
- Deploy it to your favorite cloud provider (Multi-node Cluster Deployment)
- Sign up for a free YugabyteDB Managed cluster
It’s easy! Get started today!