Better Than PostgreSQL! In-Place Index Updates with YugabyteDB
A PostgreSQL weakness is its update mechanism, which creates a complete copy of the table row and updates all indexes, even if just a single column is changed.
YugabyteDB, as a fork of PostgreSQL, initially followed the same method. However, it has now improved on this by updating only the necessary columns. This enhancement allows for the addition of more indexes or the indexing of more columns, improving query performance across all use cases.
This blog examines how YugabyteDB modified the storage layer of PostgreSQL and how this enhancement delivers enhanced performance.
Index Challenges in PostgreSQL
When optimizing an SQL query, it’s important to avoid unnecessary work, such as reading a large number of unnecessary rows.
Indexes allow you to read specific ranges of rows from a table based on a WHERE clause or an ORDER BY LIMIT. However, additional predicates may exist that cannot be addressed by an index condition and must be filtered during the scan. A common strategy to prevent unnecessary rows from being read from the table is to apply the filter beforehand. This ensures that the index covers columns used by the filter can be applied directly to index entries. This method improves the latency of an Index Scan and can even allow an Index Only Scan.
Maintaining the index order depends on the column ordering in the key and is essential for finding ranges and retrieving results. To incorporate additional columns, add them at the end of the key, or include them out of the key with the INCLUDE clause. This choice has some consequences for query performance and data manipulation.
When dealing with a UNIQUE index, you cannot add these columns to the key without altering its unique definition, so you must place them in the INCLUDE clause. For non-UNIQUE indexes, you should consider how these columns will be used:
- If they serve as an index condition for another query, including them in the key can be beneficial.
- If updated frequently, it may be better to avoid modifying the key and include them only in the index entry.
In general, it is advisable to avoid including frequently updated columns in the key as modifying the key involves deleting the current entry and inserting a new one, changing its logical position in the index. Updating a column included in the index entry but not in the key does not affect its position and allows for an in-place update. However, it’s worth noting that PostgreSQL does not support in-place updates for tables and indexes.
PostgreSQL’s table and index access methods implement tuple update as a delete and insert.
How YugabyteDB Overcomes Index Limitations
YugabyteDB has overcome PostgreSQL index limitations by supporting in-place updates through the LSM Tree. Previously, the only optimization available was for updates that set columns to the same value. In such cases, the system could skip index maintenance and foreign key verification.
This functionality is controlled by a parameter called yb_skip_redundant_update_ops
, which is enabled by default.
When updating to a different value, the index must be updated. However, the introduction of in-place updates in YugabyteDB version 2.23.1 (preview) and 2024.2.1 (stable) has made this process more efficient and, where possible, avoids the delete-insert overhead. This is controlled by a parameter called yb_enable_inplace_index_update
, which is enabled by default.
YugabyteDB Index Demo
This small demo shows the benefits of in-place index updates. I’m running this in YugabyteDB 2.23.0 and 2.23.1 to show the difference in the number of rows scanned with and without this new feature.
I created a table with three columns and created an index that uses those three columns in the following locations:
- The key definition
- The include clause
- A where clause for partial indexing
create table demo ( id bigint primary key , in_the_key int , in_include int , in_where int ); insert into demo select generate_series(1,10000), 0, 0, 0 ; create index on demo ( in_the_key ) include ( in_include ) where in_where = 0 ; analyze demo;
This index can be used to filter from the index (Index
Cond
and Storage Index Filter
) without a read request to the table:
explain (analyze, dist, costs off, summary off) select * from demo where in_the_key=0 and in_include=1 and in_where=0; QUERY PLAN ----------------------------------------------------------------------------------------- Index Scan using demo_in_the_key_in_include_idx on demo (actual time=7.306..7.307 rows=0 loops=1) Index Cond: (in_the_key = 0) Storage Index Filter: (in_include = 1) Storage Index Read Requests: 1 Storage Index Read Execution Time: 6.874 ms Storage Index Rows Scanned: 10000
An index on the single column “in_the_key
” would have shown additional table read requests like this: Storage Table Rows Scanned: 10000
even if no rows were finally returned.
Thanks to the presence of “in_include
” in the index entry, it was filtered when scanning the index entries.
This index is good for queries, but let’s check the cost of updating the columns. Next, I run updates in the following places:
- The key column
- The include column
- The column used for partial indexing
I run the same update twice, to see the performance when all rows are updated to a new value and when all rows are updated to the same value. Later, I’ll run the same update on PostgreSQL so we can compare.
Updating an Index Key Column
In YugabyteDB 2.23.0, updating 10000 table rows requires deleting the old key and inserting the new one. This results in two index writes (Index Write Requests
) per row (Table Write Requests
):
yugabyte=# explain (analyze, dist, costs off, summary off) update demo set in_the_key=42; QUERY PLAN ------------------------------------------------------------------------ Update on demo (actual time=441.937..441.937 rows=0 loops=1) -> Seq Scan on demo (actual time=1.374..325.462 rows=10000 loops=1) Storage Table Read Requests: 12 Storage Table Read Execution Time: 1.304 ms Storage Table Rows Scanned: 10000 Storage Table Write Requests: 10000 Storage Index Write Requests: 20000 Storage Flush Requests: 10 Storage Flush Execution Time: 309.678 ms (9 rows) yugabyte=# explain (analyze, dist, costs off, summary off) update demo set in_the_key=42; QUERY PLAN ----------------------------------------------------------------------- Update on demo (actual time=11016.971..11016.972 rows=0 loops=1) -> Seq Scan on demo (actual time=1.245..58.077 rows=10000 loops=1) Storage Table Read Requests: 12 Storage Table Read Execution Time: 1.116 ms Storage Table Rows Scanned: 10000 Storage Table Write Requests: 10000 Storage Index Write Requests: 20000 Storage Flush Requests: 10010 Storage Flush Execution Time: 10314.145 ms (9 rows)
When all keys are updated to a new value, the writes are buffered into only ten flush requests. When they are updated to the same value, it cannot be buffered. This is the “Write to the Same Row” issue I discuss in this blog post about write buffering. The number of flush requests impacts the response time as it involves a Raft consensus over the network.
This has been resolved in YugabyteDB 2.23.1 which avoids requesting the index write to the same key:
yugabyte=# explain (analyze, dist, costs off, summary off) update demo set in_the_key=42; QUERY PLAN ------------------------------------------------------------------------ Update on demo (actual time=519.627..519.628 rows=0 loops=1) -> Seq Scan on demo (actual time=1.196..288.603 rows=10000 loops=1) Storage Table Read Requests: 10 Storage Table Read Execution Time: 2.557 ms Storage Table Rows Scanned: 10000 Storage Table Write Requests: 10000 Storage Index Write Requests: 20000 Storage Flush Requests: 8 Storage Flush Execution Time: 273.347 ms (9 rows) yugabyte=# explain (analyze, dist, costs off, summary off) update demo set in_the_key=42; QUERY PLAN ------------------------------------------------------------------------ Update on demo (actual time=271.633..271.634 rows=0 loops=1) -> Seq Scan on demo (actual time=1.380..229.685 rows=10000 loops=1) Storage Table Read Requests: 10 Storage Table Read Execution Time: 2.722 ms Storage Table Rows Scanned: 10000 Storage Table Write Requests: 10000 Storage Flush Requests: 8 Storage Flush Execution Time: 212.110 ms (8 rows)
Updating an Include Clause Column
In YugabyteDB 2.23.0, updating 10000 table rows deletes and inserts the key even if it doesn’t change, because the updated column is not in the key. This results in 20000 index writes that are not buffered and one flush request per row in all cases, because the key is the same:
yugabyte=# explain (analyze, dist, costs off, summary off) update demo set in_include=42; QUERY PLAN ----------------------------------------------------------------------- Update on demo (actual time=11054.913..11054.913 rows=0 loops=1) -> Seq Scan on demo (actual time=1.473..56.763 rows=10000 loops=1) Storage Table Read Requests: 12 Storage Table Read Execution Time: 1.348 ms Storage Table Rows Scanned: 10000 Storage Table Write Requests: 10000 Storage Index Write Requests: 20000 Storage Flush Requests: 10010 Storage Flush Execution Time: 10352.150 ms (9 rows) yugabyte=# explain (analyze, dist, costs off, summary off) update demo set in_include=42; QUERY PLAN ----------------------------------------------------------------------- Update on demo (actual time=10949.641..10949.641 rows=0 loops=1) -> Seq Scan on demo (actual time=1.545..58.355 rows=10000 loops=1) Storage Table Read Requests: 12 Storage Table Read Execution Time: 1.399 ms Storage Table Rows Scanned: 10000 Storage Table Write Requests: 10000 Storage Index Write Requests: 20000 Storage Flush Requests: 10010 Storage Flush Execution Time: 10246.231 ms (9 rows)
This is improved in 2.23.1, where there’s only one write request per row because it updates the existing index entry in-place and buffers it without the need to immediately flush it:
yugabyte=# explain (analyze, dist, costs off, summary off) update demo set in_include=42; QUERY PLAN ------------------------------------------------------------------------ Update on demo (actual time=470.527..470.528 rows=0 loops=1) -> Seq Scan on demo (actual time=2.213..305.365 rows=10000 loops=1) Storage Table Read Requests: 10 Storage Table Read Execution Time: 3.016 ms Storage Table Rows Scanned: 10000 Storage Table Write Requests: 10000 Storage Index Write Requests: 10000 Storage Flush Requests: 8 Storage Flush Execution Time: 271.396 ms (9 rows) yugabyte=# explain (analyze, dist, costs off, summary off) update demo set in_include=42; QUERY PLAN ------------------------------------------------------------------------ Update on demo (actual time=280.508..280.508 rows=0 loops=1) -> Seq Scan on demo (actual time=2.010..237.985 rows=10000 loops=1) Storage Table Read Requests: 10 Storage Table Read Execution Time: 3.905 ms Storage Table Rows Scanned: 10000 Storage Table Write Requests: 10000 Storage Flush Requests: 8 Storage Flush Execution Time: 218.213 ms (8 rows)
Update the Where Clause Column of a Partial Index
Finally, to cover all eventualities, I do the same with a column that is neither in the key nor in the include, but is used for partial indexing.
The new feature doesn’t change this case. In all versions, the first update removes the index entry from the partial index, resulting in one index write request per table row. The second one has nothing to update in the index:
yugabyte=# explain (analyze, dist, costs off, summary off) update demo set in_where=42; QUERY PLAN ------------------------------------------------------------------------ Update on demo (actual time=437.156..437.157 rows=0 loops=1) -> Seq Scan on demo (actual time=7.610..311.777 rows=10000 loops=1) Storage Table Read Requests: 10 Storage Table Read Execution Time: 5.854 ms Storage Table Rows Scanned: 10000 Storage Table Write Requests: 10000 Storage Index Write Requests: 10000 Storage Flush Requests: 8 Storage Flush Execution Time: 273.685 ms (9 rows) yugabyte=# explain (analyze, dist, costs off, summary off) update demo set in_where=42; QUERY PLAN ------------------------------------------------------------------------ Update on demo (actual time=312.628..312.629 rows=0 loops=1) -> Seq Scan on demo (actual time=2.170..269.074 rows=10000 loops=1) Storage Table Read Requests: 10 Storage Table Read Execution Time: 5.221 ms Storage Table Rows Scanned: 10000 Storage Table Write Requests: 10000 Storage Flush Requests: 8 Storage Flush Execution Time: 247.952 ms (8 rows)
How PostgreSQL Performs
To compare and understand how YugabyteDB improves PostgreSQL performance, I’ve run the same queries in PostgreSQL 17:
create table demo ( id bigint primary key , in_the_key int , in_include int , in_where int ) with (fillfactor=10 ); insert into demo select generate_series(1,10000), 0, 0, 0 ; create index on demo ( in_the_key ) include ( in_include ) where in_where = 0 ; analyze demo;
I used a low fill factor to benefit from HOT updates. This is an optimization in PostgreSQL that mitigates the effects of its update implementation. It reserves some space within the table’s pages, allowing the new copy of a row to fit into the same page. This approach helps avoid the need to update all indexes.
YugabyteDB uses index-organized tables with key-value LSM trees not limited by the page size, so does not require this optimization.
With YugabyteDB, I used the dist
option of `explain analyze`
to display the index and table reads and write operations. PostgreSQL reads and writes full pages, and the options buffers
and wal
display the pages read from the shared buffers and the writes to the Write-Ahead Log (WAL), which makes it durable.
The CPU and I/O cost of updating a column in a PostgreSQL table is huge. With a single secondary index, updating one column to a new value in ten thousand rows reads seventy thousand pages and writes thirty thousand WAL records:
postgres=# explain (analyze, buffers, wal, costs off, summary off) update demo set in_the_key=42; QUERY PLAN ---------------------------------------------------------------------- Update on demo (actual time=45.994..45.994 rows=0 loops=1) Buffers: shared hit=71520 read=2 dirtied=55 written=53 WAL: records=30161 bytes=2128512 -> Seq Scan on demo (actual time=0.005..1.483 rows=10000 loops=1) Buffers: shared hit=667 Planning: Buffers: shared hit=15 (7 rows) postgres=# explain (analyze, buffers, wal, costs off, summary off) update demo set in_the_key=42; QUERY PLAN ---------------------------------------------------------------------- Update on demo (actual time=13.081..13.081 rows=0 loops=1) Buffers: shared hit=21334 WAL: records=10000 bytes=680000 -> Seq Scan on demo (actual time=0.005..1.332 rows=10000 loops=1) Buffers: shared hit=667 Planning: Buffers: shared hit=18 (7 rows)
The read-and-write amplification is the same when the column is not a key index:
postgres=# explain (analyze, buffers, wal, costs off, summary off) update demo set in_include=42; QUERY PLAN ---------------------------------------------------------------------- Update on demo (actual time=43.145..43.146 rows=0 loops=1) Buffers: shared hit=70774 dirtied=27 written=27 WAL: records=30721 bytes=2177922 -> Seq Scan on demo (actual time=0.024..3.238 rows=10000 loops=1) Buffers: shared hit=667 WAL: records=667 bytes=76018 (6 rows) postgres=# explain (analyze, buffers, wal, costs off, summary off) update demo set in_include=42; QUERY PLAN ---------------------------------------------------------------------- Update on demo (actual time=13.764..13.765 rows=0 loops=1) Buffers: shared hit=20667 WAL: records=10667 bytes=757352 -> Seq Scan on demo (actual time=0.021..2.752 rows=10000 loops=1) Buffers: shared hit=667 WAL: records=667 bytes=77352 (6 rows)
The include clause doesn’t help here in PostgreSQL as it does in YugabyteDB, but partial indexes can reduce the penalty:
postgres=# explain (analyze, buffers, wal, costs off, summary off) update demo set in_where=42; QUERY PLAN ---------------------------------------------------------------------- Update on demo (actual time=39.436..39.436 rows=0 loops=1) Buffers: shared hit=59929 WAL: records=20851 bytes=1484199 -> Seq Scan on demo (actual time=0.025..3.103 rows=10000 loops=1) Buffers: shared hit=667 WAL: records=667 bytes=76018 (6 rows) postgres=# explain (analyze, buffers, wal, costs off, summary off) update demo set in_where=42; QUERY PLAN ---------------------------------------------------------------------- Update on demo (actual time=13.810..13.811 rows=0 loops=1) Buffers: shared hit=20667 WAL: records=10667 bytes=757352 -> Seq Scan on demo (actual time=0.022..2.863 rows=10000 loops=1) Buffers: shared hit=667 WAL: records=667 bytes=77352 (6 rows)
PostgreSQL storage implements only inserts and deletes tuples, and never updates them. The SQL update is physically implemented by inserting a new version of the complete row and marking the old one for deletion. Adding columns for covering indexes has the same huge overhead when placed in the include clause rather than the key.
The limited size of an index entry when adding columns to the index is another limitation of PostgreSQL. This is because the block-based B-tree structure doesn’t offer the same flexibility as YugabyteDB LSM trees.
Optimizing Further From the Application
One optimization to consider is detecting when an update writes the same value to a column, which allows the system to avoid unnecessary updates to indexes and the verification of foreign keys.
However, you might wonder why the database still writes to the table in such cases.
When a column is updated to the same value, the SQL statement still indicates the intention to perform an update. This intention has implications, such as acquiring locks, triggering events, and creating a Change Data Capture record. The database cannot ignore this user intent.
While indexes are not visible to the SQL user, and their maintenance is considered an implementation detail, the database can skip updating them when unnecessary. In contrast, the update operation on the column cannot be skipped, because it reflects a user’s intention.
Additional optimization should occur at the application level. For instance, Hibernate can perform a dynamic update, generating an update statement that only includes changed columns. Given the information discussed in this blog, this approach is recommended when using YugabyteDB. Read YugabyteDB column-level locking, by Vlad Mihalcea, for more reasons to consider DynamicUpdate. The optimization is essential in YugabyteDB but has a limited impact on PostgreSQL, which will always update all rows, even if they are not present in the SQL statement.
Conclusion
YugabyteDB’s introduction of in-place index updates marks a significant improvement to PostgreSQL’s traditional index and table update mechanics.
PostgreSQL’s update model relies on deleting and reinserting updated rows, resulting in extra work. Using the same implementation in YugabyteDB would involve a write flush per row, which is unnecessary for indexes where non-key columns change frequently. By leveraging in-place updates in the underlying LSM Tree, YugabyteDB reduces unnecessary flushes, network calls, and waits for Raft consensus.
Unlike PostgreSQL, where it is advisable to minimize indexing, YugabyteDB benefits from having indexes that cover all selective predicates, such as Index Condition or Storage Index Filters. When querying a large number of rows, performance can be improved by indexing all columns involved in the query. This makes it possible to utilize an Index Only Scan.