Advisory Locking: Avoiding Serializable Isolation and Achieving Efficient Retention Archiving
This blog discusses a use case for PG advisory lock that protects against write skew and provides a solution for efficient deletes that meet retention requirements. We also cover why it is preferred to partition in this case.
About the Transaction-Limit Service Example
This example comes from a transaction-limit (risk control) service that limits how much an entity can transact.
A simple calendar-month aggregate is not sufficient here because decisions are made on rolling windows (24 hours, 7 days, and 30 days) using hourly buckets. Each incoming transaction must queue behind a specific business key (entity + payment channel + direction), and every decision must be auditable with idempotent request logging.
Limits are calculated from hourly buckets (bucket_ts, hour-truncated) across a rolling 30-day range, and old data must be deleted continuously.
Serializable and advisory lock
PostgreSQL and YugabyteDB both provide serializable correctness, but they implement it differently:
- PostgreSQL serializable is SSI-based
- YugabyteDB serializable currently uses 2PL-style locking behavior
The correctness goal is the same, but the runtime behavior differs.
In YSQL, there may be more serializable errors. To avoid the performance penalty of serializable isolation, using read committed/snapshot isolation with a single transaction-scoped advisory lock on the business aggregate key often provides a cleaner queueing model for hot keys.
As the limit check reads a 30-day range of hourly buckets, two concurrent requests on the same business key can both read the same pre-update totals but write different hour buckets (for example, adjacent hours). This means a direct write-write row conflict is possible.
That is why repeatable read without advisory lock is insufficient here; the business-key advisory lock serializes the check-then-apply decision path.
Partitioning and Retention
This risk control service workload is high-throughput and low-latency. With PostgreSQL-style partitioning, there is no native “global index” across partitions, so query routing and index access happen per partition. That is very different from Oracle patterns, where a global index can keep lookup behavior more centralized.
As partition count grows, the read path can involve more serial remote calls, including cross-node and cross-AZ reads, or even cross-region reads. This pushes up latency and resource usage.
Although PG parallel append can add parallelism to the serial remote calls, that is unlikely to be worth it in OLTP lookups.
Un-pruned daily partitions are usually the worst case for latency; un-pruned weekly is okay, but un-pruned monthly is better for lookup overhead. The trade-off with monthly is retention granularity: you can carry around an extra month of data, depending on your drop schedule.
Local Indexes Per Partition
The practical issue here is lookups as the partition count increases. In PostgreSQL/YSQL, partitioned tables have partition-local indexes, so fan-out patterns may require more index probes and more routing than a single unpartitioned index path.
| Model | Lookup Path (when query spans many time ranges) | Operational Impact |
| Unpartitioned table (single index) | One index structure handles all rows. | Always one index condition node in the execution plan. With one network round trip. |
| PG/YSQL partitioned table (local indexes) | The planner may need to touch multiple partition indexes when the query spans multiple partitions. | One or more index condition nodes in the execution plan, without parallel query (which has overhead) they are multiple network round trips executed serially. |
Example Schema
Here’s a broad outline of what the DDL might look like for this use-case service.
For the purpose of this blog, the key points are the risk_hourly_agg primary key and the risk_request_log index.
DROP TABLE IF EXISTS public.risk_request_log; DROP TABLE IF EXISTS public.risk_hourly_agg; CREATE TABLE public.risk_hourly_agg ( entity_id UUID NOT NULL, bucket_ts TIMESTAMP NOT NULL, channel_code TEXT NOT NULL, direction_code TEXT NOT NULL, sum_amount BIGINT NOT NULL, tx_count INT NOT NULL, PRIMARY KEY ((entity_id, channel_code, direction_code) HASH, bucket_ts ASC) ); CREATE TABLE public.risk_request_log ( dedupe_id UUID NOT NULL, entity_id UUID NOT NULL, bucket_ts TIMESTAMP NOT NULL, decision TEXT NOT NULL, channel_code TEXT NOT NULL, direction_code TEXT NOT NULL, PRIMARY KEY ((entity_id, channel_code, direction_code) HASH, dedupe_id ASC) ); DROP INDEX IF EXISTS public.risk_request_log_delete_idx; CREATE INDEX risk_request_log_delete_idx ON public.risk_request_log USING lsm ((yb_hash_code(bucket_ts) % 9) ASC, bucket_ts ASC) INCLUDE (entity_id, channel_code, direction_code, dedupe_id) SPLIT AT VALUES ((1), (2), (3), (4), (5), (6), (7), (8));
Below is the business decision flow in pseudo DML, shown only to convey intent (since this is not the focus of this blog).
BEGIN; SELECT pg_advisory_xact_lock(hash(entity_id, channel_code, direction_code)); WITH limits AS ( -- Read rolling amount/count usage for last 30d from risk_hourly_agg ), apply AS ( -- If limits allow, UPSERT the current-hour rollup row ) INSERT INTO risk_request_log (...) SELECT ..., CASE WHEN apply_wrote_row THEN 'APPROVED' ELSE 'REJECTED' END; COMMIT;
Archiving Path
Our focus here is on the delete statements. Since we opted not to use partitioning, the deletes need to be efficient.
Risk_hourly_agg table deletes
DELETE FROM public.risk_hourly_agg WHERE yb_hash_code(entity_id, channel_code, direction_code) BETWEEN :hash_start AND :hash_end AND bucket_ts::timestamp >= :from_ts::timestamp AND bucket_ts::timestamp < :to_ts::timestamp;
For deletes from risk_hourly_agg table, you use the primary key. In this uniqueness model, bucket_ts is part of the PK.
YugabyteDB provides a function, yb_hash_code, that enables efficient range scans across the hash portion of an index or primary key.
Using this with bounded windows lets us archive only the rows we need, without the need to scan extra rows or tombstones.
The execution plan should look like this:
Index Cond: ((yb_hash_code(entity_id, channel_code, direction_code) >= 50967) AND (yb_hash_code(entity_id, channel_code, direction_code) <= 58247) AND (bucket_ts >= '2026-03-28 19:00:00'::timestamp without time zone) AND (bucket_ts < '2026-03-28 20:00:00'::timestamp without time zone))
Risk_request_log Deletes
We have a choice between two delete strategies on the same table (risk_request_log).
- If we delete risk_request_log via its PK, bucket_ts is evaluated as a storage filter. This means broader scans and more repeated tombstone reads until compaction.
- The bucketized index path treats timestamp-based deletes as index conditions, reducing scan scope and tombstone pressure.
Both options are fine; you should make the choice based on whether you want to maintain the index or not.
risk_request_log delete options:
| Option | How the timestamp predicate is applied | Effect over repeated runs |
| PK delete on risk_request_log | bucket_ts acts as a storage filter (post-index-row selection). | Can read broader ranges and revisit old tombstones until compaction. A great benefit is that no index is required. |
| Bucketized index delete on risk_request_log (chosen) | (yb_hash_code(bucket_ts)%9, bucket_ts) is in index conditions. | Targeted window scans, no repeated tombstone pressure. The downside is that an index is required that must be maintained |
For the risk_request_log delete we have selected the bucketized index to delete from, as we don’t mind having this timestamp index for other reasons (so we don’t mind maintaining the index).
The index uses a virtual bucket (expression) to assign a random number between 0 and 8 to each timestamp entry, enabling linear write scalability on an ever-increasing range column. Read the blog Lift-and-Shift of High Write-Throughput Apps with YugabyteDB to find out more.
CREATE INDEX risk_request_log_delete_idx ON public.risk_request_log USING lsm ((yb_hash_code(bucket_ts) % 9) ASC, bucket_ts ASC) INCLUDE (entity_id, channel_code, direction_code, dedupe_id) SPLIT AT VALUES ((1), (2), (3), (4), (5), (6), (7), (8));
With this index, we do not use yb_hash_code on the primary key for this table. Rows are divided into 9 buckets, and we delete one bucket at a time within bounded windows, providing low-latency deletes that avoid scanning more rows than are required for archiving.
DELETE FROM public.risk_request_log WHERE (yb_hash_code(bucket_ts) % 9) = :slot AND bucket_ts::timestamp >= :from_ts::timestamp AND bucket_ts::timestamp < :to_ts::timestamp;
If done correctly, the execution plans should look like this:
From test output: Index Cond: (((yb_hash_code(bucket_ts) % 9) = 6) AND (bucket_ts >= '2026-03-28 19:00:00'::timestamp without time zone) AND (bucket_ts < '2026-03-28 20:00:00'::timestamp without time zone))
Archiving Runbook (Advancing Windows)
You can use this runbook as an operating loop: define the time window, run both delete paths for that window, then advance the cursor.
| Phase | Action | Why |
| 1. Set Cutoff | Compute cutoff_ts = date_trunc(‘hour’, now()::timestamp – interval ’30 days’). | Defines the retention boundary once for the run. |
| 2. Set Window | Choose a window size (start at 180 minutes) and set cursor_ts to the oldest bucket to clean. | Controls transaction size and catch-up speed. |
| 3. Bound Window | For each cycle, set from_ts = cursor_ts and to_ts = LEAST(cursor_ts + window, cutoff_ts). | Guarantees bounded, predictable delete work. |
| 4. Delete Agg Table | Run risk_hourly_agg delete with 9 workers over 9 disjoint PK hash ranges (0..65535). | Uses PK-aligned access for efficient bounded deletes. |
| 5. Delete Log Table | Run risk_request_log delete with 9 workers over slots 0..8 on (yb_hash_code(bucket_ts) % 9). | Uses index buckets to reduce tombstone-heavy scans. |
| 6. Advance Cursor | Advance cursor_ts = to_ts only after both table deletes are complete for the window. | Prevents gaps and keeps both tables in sync. |
| 7. Tune and Stop | Stop catch-up at cutoff_ts. For steady state, keep scheduling windows. If transactions are too large, reduce the window size to 60/30/15 minutes. | Provides a stable operating loop and tuning knob. |
Conclusion
This data modeling keeps the write path predictable and gives practical control of retention cleanup without partition-management overhead. Advisory lock serialization is scoped to the business key, and delete work is controlled by the window width and a fixed parallel fanout.
Create a YugabyteDB Aeon account for free and discover the fully managed, cloud-native distributed SQL database for transactional applications.