When Using YugabyteDB, How Many Shards Should I Break My Table Into?
This question will be less important with automatic tablet splitting set as the default. However, it is still essential you understand the ‘why’ behind the choices.
First, Yugabyte defines tablet splitting as:
“…the resharding of data in the cluster by presplitting tables before data is added or by changing the number of tablets at runtime.”
There are different ways to set the number of tablets, or shards, on your YugabyteDB cluster. (You can read about this in another tips and tricks blog). Although it is hard to gauge the exact number of tablets, since it depends on your table size and activity, here are some high-level suggestions:
Small tables (< 1 GB)
To keep your total tablet count low, you should create a colocated tablet for your smaller tables, while only keeping your larger ones sharded across the cluster.
In the case of a colocated tablet, the tablet is replicated based on your Replication Factor (RF). However, there is a single tablet leader. Remember that having those smaller tables colocated may or may not help depending on the amount of activity on those tables, so you still want to test this.
If you choose not to colocate your smaller tables, you can split each table into a single tablet by setting SPLIT INTO 1 TABLETS during table creation, or by enabling auto-splitting. Auto-splitting will split each small table into a single tablet as long as it is under the threshold of tablet_split_low_phase_size_threshold_byte. This is set to 512 MB by default.
Medium tables (single digit GBs to 100s of GB)
A good place to start for medium-sized tables, whether you want to enable auto-splitting or not, would be 8 tablets per tserver. This would be 24 total leader tablets in a 3 node 3 RF cluster.
The goal here is to keep each tablet under 10GB. If you anticipate this table will grow consistently, we recommend enabling auto-splitting, which allows YugabyteDB to automatically split your table into additional tablets as it grows.
Large tables (>250 GBs)
Typically the larger your table, the more tablets you want to break it into, to avoid hotspots and ensure equal resource usage across the nodes in your cluster.
Keep in mind that you want to keep the number of tablets across your cluster as low as possible, although it is natural for this count to grow as your tables increase in size.
You can pre-split your large tables – which could be anywhere from 8 to 24 tablets per tserver depending on your instance size – to avoid too many splits during bulk loading. However, by enabling auto-splitting, you ensure that after the initial load, YugabyteDB will auto-split from that point on as your data size increases.
For a more detailed explanation of sharding and the auto-sharding mechanics in YugabyteDB, check out Distributed SQL Sharding: How Many Tablets, and at What Size?
P.S. Keep in mind that indexes are sharded in the same way as tables.
Discover More Tips and Tricks
A library of distributed SQL tips and tricks and general “how to” information can be found by searching the YugabyteDB blog, as well as our DEV Community Blogs. Some recent “how to” highlights include:
- Create Alert Configurations For YugabyteDB Anywhere Using APIs
- Comparing the Maximum Availability of YugabyteDB and Oracle Database
- Reference Architecture for Deploying YugabyteDB with VMware Tanzu
- Data Streaming Using YugabyteDB CDC, Kafka, and SnowflakeSinkConnector
- Using MinIO for YugabyteDB Backup and Restore
- Unlocking Azure Storage Options With YugabyteDB CDC
- Optimizing YugabyteDB Memory Tuning for YSQL
Events and Training
Check out the upcoming YugabyteDB events, including all training sessions, conferences, in-person and virtual events, and YugabyteDB Friday Tech Talks (designed for engineers by engineers).
In addition, there is some extremely popular “how to” content on the YugabyteDB YouTube channel.
If You Have Questions About Distributed SQL
If you have questions, make sure to ask them on the YugabyteDB Slack channel, Forum, GitHub, or Stack Overflow. For more tips and tricks, check out our Distributed Tips and Tricks archive.
Ready to start exploring YugabyteDB features? You have some great options to get started. Run the database locally on your laptop (Quick Start), deploy it to your favorite cloud provider (Multi-node Cluster Deployment), sign up for a free YugabyteDB Managed cluster, or request a full-featured trial. It’s easy! Get started today!