Skip to content

PATTERN Cited by 1 source

CTE-emulated upsert

Definition

CTE-emulated upsert is a Postgres query pattern for "insert-or-update" semantics that replaces the built-in INSERT ... ON CONFLICT DO UPDATE with a data-modifying Common Table Expression (CTE) composing two separate statements: an INSERT ... ON CONFLICT DO NOTHING inside the CTE, followed by a conditional UPDATE gated on whether the CTE inserted anything.

The motivation is a specific Postgres behavior: ON CONFLICT DO UPDATE locks the conflicting row before evaluating the WHERE condition, even when the WHERE would filter the row out and no column would be modified. That row lock mutates tuple metadata with a transaction ID, which forces a COMMIT record, which forces a WAL fsync. At scale (Datadog: 25,000 upserts/sec targeted) the fsync cost per no-op call can exceed the cluster's single-writer sync budget (Source: sources/2026-03-23-datadog-debugging-postgres-upsert-wal).

Template

WITH insert_attempt AS (
    INSERT INTO host_last_ingested (host_id)
    VALUES (:host_id)
    ON CONFLICT DO NOTHING
    RETURNING host_id
)
UPDATE host_last_ingested
   SET last_ingested = now()
 WHERE host_id = :host_id
   AND last_ingested < now() - '1 day'::interval
   AND NOT EXISTS (SELECT FROM insert_attempt);

Three cases, observed directly via pg_walinspect:

Row state What executes WAL records
Row absent CTE inserts; outer UPDATE short-circuits via NOT EXISTS INSERT+INIT + Btree NEWROOT + INSERT_LEAF + HEAP_CONFIRM + COMMIT
Row present, stale (≥1 day old) CTE does nothing; outer UPDATE runs HOT_UPDATE + COMMIT
Row present, fresh (<1 day old) CTE does nothing; outer UPDATE predicate false (none)

The empty-records case is the whole point — the common path writes zero WAL records (Source: sources/2026-03-23-datadog-debugging-postgres-upsert-wal).

Why this avoids the lock-WAL cost

  • ON CONFLICT DO NOTHING does not take a row lock on the conflicting row. If the row exists, the statement simply becomes a no-op without mutating tuple metadata, and therefore emits no WAL record.
  • The outer UPDATE only takes a row lock when its WHERE predicate actually matches a live row that will be modified. Unmatched rows are not locked.
  • No xid is assigned in the fresh-row case; no COMMIT record is required; no fsync is forced.

Trade-off: a small, workload-dependent race

Because no row is locked across the CTE → UPDATE boundary, a concurrent DELETE between the two statements can cause the update to miss a row that existed at the start of the operation:

  1. Tx1 runs INSERT ... ON CONFLICT DO NOTHING: row exists, CTE returns empty.
  2. Tx2 DELETEs the row.
  3. Tx1's outer UPDATE finds no matching row, and the NOT EXISTS guard does not trigger a re-insert.

INSERT ... ON CONFLICT DO UPDATE's implicit row lock would have serialized this. The CTE version does not. The pattern is only correct when the workload tolerates a low-probability dropped update, and the trade-off should be stated explicitly:

  • Datadog's host-liveness tracker tolerates it because host-liveness is inherently imprecise (the deletion threshold is 7 days of inactivity and hosts rarely return after that).
  • Billing counters, idempotency state machines, leader-election state, and other correctness-critical stores do not tolerate it; they should keep the implicit lock and pay the fsync cost (or move to a different design entirely — batch + dedup, queue-backed consumer).

Generalizations

  • Any upsert where the conditional predicate is expected to be false on the common path is a candidate — the goal is always to avoid the per-call lock-then-skip WAL cost.
  • The pattern is Postgres-specific; equivalent reasoning does not transfer to MySQL INSERT ... ON DUPLICATE KEY UPDATE or SQL Server MERGE, which have different locking and redo-log semantics.
  • For multi-row upserts, the same shape applies with a RETURNING set and a correlated outer UPDATE (WHERE ... AND NOT EXISTS (SELECT FROM insert_attempt WHERE host_id = t.host_id)).

Relation to other wiki concepts

Seen in

  • sources/2026-03-23-datadog-debugging-postgres-upsert-wal — Datadog replaces their INSERT ... ON CONFLICT DO UPDATE WHERE last_ingested < ... - '1 day' query (2 WAL records per no-op call, 500 fsyncs/sec at 500 upserts/sec, untenable at 25,000 upserts/sec target) with the CTE-emulated form. WAL syncs post-rollout tracked actual update rate instead of call rate; the race window is accepted because host-liveness tracking is inherently imprecise.
Last updated · 200 distilled / 1,178 read