Advisory Locking: Avoiding Serializable Isolation and Achieving Efficient Retention Archiving

May 6, 2026

This blog discusses a use case for PG advisory lock that protects against write skew and provides a solution for efficient deletes that meet retention requirements. We also cover why it is preferred to partition in this case.

About the Transaction-Limit Service Example

This example comes from a transaction-limit (risk control) service that limits how much an entity can transact.

A simple calendar-month aggregate is not sufficient here because decisions are made on rolling windows (24 hours, 7 days, and 30 days) using hourly buckets. Each incoming transaction must queue behind a specific business key (entity + payment channel + direction), and every decision must be auditable with idempotent request logging.

Limits are calculated from hourly buckets (bucket_ts, hour-truncated) across a rolling 30-day range, and old data must be deleted continuously.

Serializable and advisory lock

PostgreSQL and YugabyteDB both provide serializable correctness, but they implement it differently:

PostgreSQL serializable is SSI-based
YugabyteDB serializable currently uses 2PL-style locking behavior

The correctness goal is the same, but the runtime behavior differs.

In YSQL, there may be more serializable errors. To avoid the performance penalty of serializable isolation, using read committed/snapshot isolation with a single transaction-scoped advisory lock on the business aggregate key often provides a cleaner queueing model for hot keys.

As the limit check reads a 30-day range of hourly buckets, two concurrent requests on the same business key can both read the same pre-update totals but write different hour buckets (for example, adjacent hours). This means a direct write-write row conflict is possible.

That is why repeatable read without advisory lock is insufficient here; the business-key advisory lock serializes the check-then-apply decision path.

Partitioning and Retention

This risk control service workload is high-throughput and low-latency. With PostgreSQL-style partitioning, there is no native “global index” across partitions, so query routing and index access happen per partition. That is very different from Oracle patterns, where a global index can keep lookup behavior more centralized.

As partition count grows, the read path can involve more serial remote calls, including cross-node and cross-AZ reads, or even cross-region reads. This pushes up latency and resource usage.

Although PG parallel append can add parallelism to the serial remote calls, that is unlikely to be worth it in OLTP lookups.

Un-pruned daily partitions are usually the worst case for latency; un-pruned weekly is okay, but un-pruned monthly is better for lookup overhead. The trade-off with monthly is retention granularity: you can carry around an extra month of data, depending on your drop schedule.

Local Indexes Per Partition

The practical issue here is lookups as the partition count increases. In PostgreSQL/YSQL, partitioned tables have partition-local indexes, so fan-out patterns may require more index probes and more routing than a single unpartitioned index path.

Model	Lookup Path (when query spans many time ranges)	Operational Impact
Unpartitioned table (single index)	One index structure handles all rows.	Always one index condition node in the execution plan. With one network round trip.
PG/YSQL partitioned table (local indexes)	The planner may need to touch multiple partition indexes when the query spans multiple partitions.	One or more index condition nodes in the execution plan, without parallel query (which has overhead) they are multiple network round trips executed serially.

Example Schema

Here’s a broad outline of what the DDL might look like for this use-case service.

For the purpose of this blog, the key points are the risk_hourly_agg primary key and the risk_request_log index.

DROP TABLE IF EXISTS public.risk_request_log;
DROP TABLE IF EXISTS public.risk_hourly_agg;

CREATE TABLE public.risk_hourly_agg (
		entity_id        	UUID    NOT NULL,
		bucket_ts      	TIMESTAMP NOT NULL,
		channel_code	TEXT    NOT NULL,
		direction_code  TEXT     NOT NULL,
		sum_amount     BIGINT NOT NULL,
		tx_count        	INT   	NOT NULL,
		PRIMARY KEY ((entity_id, channel_code, direction_code) HASH, bucket_ts ASC)
);

CREATE TABLE public.risk_request_log (
		dedupe_id     	UUID    NOT NULL,
		entity_id        	UUID    NOT NULL,
		bucket_ts      	TIMESTAMP NOT NULL,
		decision         	TEXT    NOT NULL,
		channel_code	TEXT    NOT NULL,
		direction_code  TEXT     NOT NULL,
		PRIMARY KEY ((entity_id, channel_code, direction_code) HASH, dedupe_id ASC)
);

DROP INDEX IF EXISTS public.risk_request_log_delete_idx;

CREATE INDEX risk_request_log_delete_idx
ON public.risk_request_log
USING lsm ((yb_hash_code(bucket_ts) % 9) ASC, bucket_ts ASC)
INCLUDE (entity_id, channel_code, direction_code, dedupe_id)
SPLIT AT VALUES ((1), (2), (3), (4), (5), (6), (7), (8));

Below is the business decision flow in pseudo DML, shown only to convey intent (since this is not the focus of this blog).

BEGIN;
SELECT pg_advisory_xact_lock(hash(entity_id, channel_code, direction_code));

WITH limits AS (
  -- Read rolling amount/count usage for last 30d from risk_hourly_agg
), apply AS (
  -- If limits allow, UPSERT the current-hour rollup row
)
INSERT INTO risk_request_log (...)
SELECT ..., CASE WHEN apply_wrote_row THEN 'APPROVED' ELSE 'REJECTED' END;

COMMIT;

Archiving Path

Our focus here is on the delete statements. Since we opted not to use partitioning, the deletes need to be efficient.

Risk_hourly_agg table deletes

DELETE FROM public.risk_hourly_agg
WHERE yb_hash_code(entity_id, channel_code, direction_code) BETWEEN :hash_start AND :hash_end
  AND bucket_ts::timestamp >= :from_ts::timestamp
  AND bucket_ts::timestamp <  :to_ts::timestamp;

For deletes from risk_hourly_agg table, you use the primary key. In this uniqueness model, bucket_ts is part of the PK.

YugabyteDB provides a function, yb_hash_code, that enables efficient range scans across the hash portion of an index or primary key.

Using this with bounded windows lets us archive only the rows we need, without the need to scan extra rows or tombstones.

The execution plan should look like this:

Index Cond: ((yb_hash_code(entity_id, channel_code, direction_code) >= 50967)
  AND (yb_hash_code(entity_id, channel_code, direction_code) <= 58247)
  AND (bucket_ts >= '2026-03-28 19:00:00'::timestamp without time zone)
  AND (bucket_ts < '2026-03-28 20:00:00'::timestamp without time zone))

Risk_request_log Deletes

We have a choice between two delete strategies on the same table (risk_request_log).

If we delete risk_request_log via its PK, bucket_ts is evaluated as a storage filter. This means broader scans and more repeated tombstone reads until compaction.
The bucketized index path treats timestamp-based deletes as index conditions, reducing scan scope and tombstone pressure.

Both options are fine; you should make the choice based on whether you want to maintain the index or not.

risk_request_log delete options:

Option	How the timestamp predicate is applied	Effect over repeated runs
PK delete on risk_request_log	bucket_ts acts as a storage filter (post-index-row selection).	Can read broader ranges and revisit old tombstones until compaction. A great benefit is that no index is required.
Bucketized index delete on risk_request_log (chosen)	(yb_hash_code(bucket_ts)%9, bucket_ts) is in index conditions.	Targeted window scans, no repeated tombstone pressure. The downside is that an index is required that must be maintained

For the risk_request_log delete we have selected the bucketized index to delete from, as we don’t mind having this timestamp index for other reasons (so we don’t mind maintaining the index).

The index uses a virtual bucket (expression) to assign a random number between 0 and 8 to each timestamp entry, enabling linear write scalability on an ever-increasing range column. Read the blog Lift-and-Shift of High Write-Throughput Apps with YugabyteDB to find out more.

CREATE INDEX risk_request_log_delete_idx
ON public.risk_request_log
USING lsm ((yb_hash_code(bucket_ts) % 9) ASC, bucket_ts ASC)
INCLUDE (entity_id, channel_code, direction_code, dedupe_id)
SPLIT AT VALUES ((1), (2), (3), (4), (5), (6), (7), (8));

With this index, we do not use yb_hash_code on the primary key for this table. Rows are divided into 9 buckets, and we delete one bucket at a time within bounded windows, providing low-latency deletes that avoid scanning more rows than are required for archiving.

DELETE FROM public.risk_request_log
WHERE (yb_hash_code(bucket_ts) % 9) = :slot
  AND bucket_ts::timestamp >= :from_ts::timestamp
  AND bucket_ts::timestamp <  :to_ts::timestamp;

If done correctly, the execution plans should look like this:

From test output:
Index Cond: (((yb_hash_code(bucket_ts) % 9) = 6)
  AND (bucket_ts >= '2026-03-28 19:00:00'::timestamp without time zone)
  AND (bucket_ts < '2026-03-28 20:00:00'::timestamp without time zone))

Archiving Runbook (Advancing Windows)

You can use this runbook as an operating loop: define the time window, run both delete paths for that window, then advance the cursor.

Phase	Action	Why
1. Set Cutoff	Compute cutoff_ts = date_trunc(‘hour’, now()::timestamp – interval ’30 days’).	Defines the retention boundary once for the run.
2. Set Window	Choose a window size (start at 180 minutes) and set cursor_ts to the oldest bucket to clean.	Controls transaction size and catch-up speed.
3. Bound Window	For each cycle, set from_ts = cursor_ts and to_ts = LEAST(cursor_ts + window, cutoff_ts).	Guarantees bounded, predictable delete work.
4. Delete Agg Table	Run risk_hourly_agg delete with 9 workers over 9 disjoint PK hash ranges (0..65535).	Uses PK-aligned access for efficient bounded deletes.
5. Delete Log Table	Run risk_request_log delete with 9 workers over slots 0..8 on (yb_hash_code(bucket_ts) % 9).	Uses index buckets to reduce tombstone-heavy scans.
6. Advance Cursor	Advance cursor_ts = to_ts only after both table deletes are complete for the window.	Prevents gaps and keeps both tables in sync.
7. Tune and Stop	Stop catch-up at cutoff_ts. For steady state, keep scheduling windows. If transactions are too large, reduce the window size to 60/30/15 minutes.	Provides a stable operating loop and tuning knob.

Conclusion

This data modeling keeps the write path predictable and gives practical control of retention cleanup without partition-management overhead. Advisory lock serialization is scoped to the business key, and delete work is controlled by the window width and a fixed parallel fanout.

Create a YugabyteDB Aeon account for free and discover the fully managed, cloud-native distributed SQL database for transactional applications.

May 6, 2026