“YugabyteDB Managed” is now called “YugabyteDB Aeon”. To find out more, visit our launch blog.

What Is the Difference Between ‘Table Partitioning’ and ‘Sharding’ In a Distributed SQL Deployment?

Distributed SQL Tips and Tricks

June 2, 2023

Lets separate this into two separate questions:

What is table partitioning and sharding?
How does YugabyteDB utilize partitioning and sharding?

First, partitioning (or table partitioning) is the process of splitting large tables into smaller, more manageable ones, providing multiple physically independent tables that are logically seen as one. Partitioning has been used for traditional (single node) databases in order to efficiently query large amounts of data. In essence you are grouping multiple subsets of data on a single node based on the column you choose to partition on and a strategy such as range or list.

Sharding (or database sharding) is the process of breaking up large tables, indexes, or partitions into smaller chunks called shards (or tablets in YugabyteDB) that are then distributed across multiple servers based on a hash or range of the primary key. Sharding is similar to partitioning in that you are breaking up a table into smaller pieces. It is different in the sense that you are distributing these pieces across multiple nodes.

In YugabyteDB you can use both concepts, so we understand why it can be confusing. Within YugabyteDB partitioning is a user-defined, SQL-level concept, thus requiring an explicit definition through SQL. Sharding on the other hand, and the load balancing of shards, is a storage level concept that is performed automatically by YugabyteDB based on your replication factor. As your data grows in size, the database will continue to create more shards and distribute them across your cluster. As you add nodes, the database will replicate them to the new nodes. This allows you to horizontally scale out with no effort to meet your high throughput requirements. Because partitioning is a SQL-level property that is dependent on the tablet definition, it is not automatic.

The main use of partitioning in YugabyteDB, or declarative partitioning as it is referred to in PostgreSQL, is to group your data together. For example, YSQL time series workloads can mimic the time-to-live (TTL) found in Cassandra or YugabyteDB’s YCQL API by partitioning on the time column and dropping the oldest partition. Another use case is to implement row-level geo-partitioning, when your data cannot leave a specific region (i.e. GDPR). Partitioning for either of these scenarios is done at the query layer using YSQL, separating itself again from sharding which is done at the storage layer (DocDB). Keep in mind that indexes created on partitioned tables are automatically partitioned.

For deeper insights check out Distributed SQL Essentials: Sharding and Partitioning in YugabyteDB and the demo-driven explanations on Partitioning in YugabyteDB and Sharding in YugabyteDB.

Learn more about YugabyteDB architecture.

Ready to start exploring YugabyteDB features? You have some great options to get started. Run the database locally on your laptop (Quick Start), deploy it to your favorite cloud provider (Multi-node Cluster Deployment), sign up for a free YugabyteDB Managed cluster, or request a full-featured trial. It’s easy! Get started today!

June 2, 2023