When To Use Range Sharding vs. Hash Sharding

Distributed SQL Tips and Tricks Series

July 7, 2023

As mentioned in the question, YugabyteDB supports two methods of sharding data: by hash and by range. The sharding method is selected when creating a table or index by setting your PRIMARY KEY.

By default, the primary key in YugabyteDB is sharded using HASH. However, you can specify ASC or DSC to determine whether the partitions should be ranged in ascending or descending order. It’s important to mention that if you’re familiar with PostgreSQL, which uses B-Tree indexes ordered by range, this distinction should be noted.

Single-row selection using the primary key index

Since YugabyteDB is an OLTP database, it runs most effectively on queries that allow single-row selection using the primary key index. These are the types of workloads hash sharding is most effective for. Here, it helps avoid hotspots since the hashing allows the access to be evenly spread over the tablets.

If you run a range scan on a hash sharded table or index, you will see that it is less effective. In this case, for queries selecting a range of rows, it will be more efficient to shard by range. But, in cases where you may have a query such as:

SELECT * from test_table WHERE id = 'foo' AND time <= NOW() FOR UPDATE SKIP LOCKED LIMIT 1;

You can create a composite primary key where the first column is sharded by HASH, and a second column is added as the clustering key in ASC or DSC.

In the case above, the primary key can look something like PRIMARY KEY(id, time). Since ASC is the default for the clustering column, the data for this table will be hash sharded, and the rows with the same value for id will be clustered in ascending order by time. This will make the query above much more efficient.

Each method has its pros and cons.

If you are eager to start exploring YugabyteDB, you can test these methods in the examples here.

Additional content on range and hash sharding can be found below: