Start Now

Q&A: Tablet Splitting, High Availability Behavior, & Sharding vs Partitioning with YugabyteDB

Marko Rajcevic

Distributed SQL Tips and Tricks series– September 2022

Welcome back to the YugabyteDB Tips and Tricks series, where I recap distributed SQL database questions from around the Internet!

This month I focus on recommended values for tablet splitting, behavior around high availability, client behavior when a node goes down, and sharding versus partitioning of a table.

When Using YugabyteDB, How Many Shards Should I Break My Table Into?

This question will be less important with automatic tablet splitting set as the default. However, it is still essential you understand the ‘why’ behind the choices.

First, Yugabyte defines tablet splitting as:

“…the resharding of data in the cluster by presplitting tables before data is added or by changing the number of tablets at runtime.”

There are different ways to set the number of tablets, or shards, on your YugabyteDB cluster. (You can read about this in our March tips and trick blog). Although it is hard to gauge the exact number of tablets, since it depends on your table size and activity, here are some high-level suggestions:

Small tables (< 1 GB)

To keep your total tablet count low, you should create a colocated tablet for your smaller tables, while only keeping your larger ones sharded across the cluster.

In the case of a colocated tablet, the tablet is replicated based on your Replication Factor (RF). However, there is a single tablet leader. Remember that having those smaller tables colocated may or may not help depending on the amount of activity on those tables, so you still want to test this.

If you choose not to colocate your smaller tables, you can split each table into a single tablet by setting SPLIT INTO 1 TABLETS during table creation, or by enabling auto-splitting. Auto-splitting will split each small table into a single tablet as long as it is under the threshold of tablet_split_low_phase_size_threshold_byte. This is set to 512 MB by default.

Medium tables (single digit GBs to 100s of GB)

A good place to start for medium-sized tables, whether you want to enable auto-splitting or not, would be 8 tablets per tserver. This would be 24 total leader tablets in a 3 node 3 RF cluster.

The goal here is to keep each tablet under 10GB. If you anticipate this table will grow consistently, we recommend enabling auto-splitting, which allows YugabyteDB to automatically split your table into additional tablets as it grows.

Large tables (>250 GBs)

Typically the larger your table, the more tablets you want to break it into, to avoid hotspots and ensure equal resource usage across the nodes in your cluster.

Keep in mind that you want to keep the number of tablets across your cluster as low as possible, although it is natural for this count to grow as your tables increase in size.

You can pre-split your large tables – which could be anywhere from 8 to 24 tablets per tserver depending on your instance size – to avoid too many splits during bulk loading. However, by enabling auto-splitting, you ensure that after the initial load, YugabyteDB will auto-split from that point on as your data size increases.

For a more detailed explanation of sharding and the auto-sharding mechanics in YugabyteDB, check out Distributed SQL Sharding: How Many Tablets, and at What Size?

P.S. Keep in mind that indexes are sharded in the same way as tables.

If a Node Becomes Unavailable, Does YugabyteDB Propagate Client Requests to Surviving Nodes, or do the Clients Have to Handle That Retry Logic?

The answer depends on a few characteristics:

  • Are the clients connected directly to the node, or through a load balancer?
  • Are you using the PostgreSQL driver, or the YugabyteDB smart driver?
  • Are you using connection pooling?

If you are connected directly to the node, then the physical connections to that node will be invalidated. Your connection pool will automatically adjust the minimum pool requirements for the other nodes in the cluster.

If required, it creates new connections to the surviving nodes, which will be transparently managed. If the node goes down abruptly in the middle of inflight transactions, those transactions will fail, and clients can retry. Or, you can set the retry logic on the application layer, based on the error code. Again, in this case, your connection pool should invalidate already established physical connections to the failed node.

From a connection string perspective, it is better to pass more than a single endpoint (at least 2/3 endpoints) so that net new client connections get established based on the remaining available endpoints. If you keep only one endpoint, there will not be any impact on the existing connections, but you can’t create net new client connections if that single endpoint is down.

An example of this recommendation using the YugabyteDB smart driver is:

jdbc:yugabytedb://127.0.0.1:5433,127.0.0.2:5433,127.0.03:5433/yugabyte?load-balance=true

NOTE: If using multi-region clusters, you want to have at least one per endpoint per region.

Will a Partitioned Table in YugabyteDB also be Sharded?

Yes, regardless if you partition it or not, every table and index in YugabyteDB will be sharded across your cluster. Think of every partition as a single table, because that is how YSQL looks at it.

Partitioning and sharding are separate concepts in YugabyteDB that can be used together to configure unique concepts such as row-level geo-partitioning for multi-region workloads. As described in Distributed SQL Essentials: Sharding and Partitioning in YugabyteDB, partitioning is at the query level (YSQL), while sharding is at the storage level (DocDB).

If you want to prove this, follow an example in our partitioning documentation. Once the partitions are created, go to http://<ip>:9000 and click on the Tablets tab to see all of the tablets created.

You can also use the following yb-admin command based on the example in the docs:

yb-admin --master_addresses <ip1>:7100,<ip2>:7100,<ip3>:7100 list_tablets ysql.yugabyte order_changes_2019_02

New Documentation, Blogs, and Tutorials

More tips and tricks, and general information can be found by searching the YugabyteDB Blog, which is updated regularly. You can also check out our DEV Community Blogs. New blogs include:

New Videos

All of our recent Distributed SQL Summit (DSS) 2022 sessions are now available to view on-demand. If you missed the online DSS event, or a specific session, simply register and watch.

You can also check out our newest video content on the Yugabyte YouTube channel.

Upcoming Events and Training

Check out the full list of upcoming YugabyteDB events, including training sessions, conferences, and additional in-person and virtual events:

More Questions on Distributed SQL?

This tips and tricks blog series would not be possible without the support of fellow Yugabeings like Denis Magda, Dorian Hoxha, Franck Pachot, and Frits Hoogland (to name a few). Thanks also to our incredible user community for not being afraid to ask questions.

Do you have more questions? If so, ask them on our YugabyteDB Slack channel, Forum, GitHub, or Stack Overflow.

For previous Tips & Tricks posts, check out our archives.

Next Steps

Ready to start exploring YugabyteDB features?

You have some great options:

It’s easy! Get started today!

Related Posts

Marko Rajcevic

Related Posts

Learn More to Accelerate Your Retail Business

Ready to dive deeper into distributed SQL, YugabyteDB, and Yugabyte Cloud?
Learn at Yugabyte University
Learn More
Browse Yugabyte Docs
Read More
Join the Yugabyte Community
Join Now