Skip to content

CONCEPT

Hot-row problem

The hot-row problem is the data-shape pattern where one row in a table receives a disproportionate fraction of the workload's writes (or reads) — typically because it represents a frequently-accessed shared entity (counter, leaderboard entry, global config flag, popular product SKU). On a row-locking engine like InnoDB, hot rows trigger row-level lock contention that serialises all concurrent writers.

Canonical example: the hot counter

"It is a common database pattern to increment an INT column when an event happens, such as a download or page view. You can go far with this pattern until bursts of these types of events happen in parallel and you experience contention on a single row." (Source: .)

The pattern maps to SQL like:

UPDATE counters SET count = count + 1 WHERE id = 1;

Under low concurrency, latency is fine. Under a traffic burst against one entity, every writer takes an X record lock on the same row, waits for the previous writer to release, and the write workload serialises.

Distinguishing hot-row from hot-key / hot-partition

Term Where it hurts Typical symptom
Hot row Single-DB relational engines (InnoDB, Postgres) Lock contention, deadlock risk
Hot key Distributed KV stores (DynamoDB, Cassandra) Per-partition throttling; uneven shard load
Hot partition Sharded systems One shard CPU-bound while others idle
Wide partition Cassandra-family One partition's data grows unbounded; compaction / read cost balloon

The hot-row problem is the relational-OLTP instance. The underlying data-skew shape (one entity is disproportionately popular) is the same as the hot-key and hot-partition problems; the mechanism of failure differs because the storage engine differs.

Why it's common in practice

  • Product surfaces create shared counters. Per-video view count, per-repository download count, per-post like count, per-seller review count. The entity is shared across many viewers or contributors.
  • Bursts are the norm, not the exception. Viral events, launches, fire-sales, and scheduled campaigns create bursts of writes to a single entity.
  • Schema is simple. One row per entity is the obvious modelling choice; the hot-row problem only surfaces under load.

Mitigations

The fix depends on the engine and the cost tolerance:

  • Slotted counter — split the hot row into N rows keyed by a slot column, pick a random slot per write, sum on read. Canonical MySQL/InnoDB fix from GitHub's github.downloads workload.
  • Shard replication for hot keys — replicate the hot row across multiple shards, route writes by hash of caller, sum on read. Equivalent at the sharding layer.
  • Out-of-database aggregation — increment in Redis / Memcached, flush to the OLTP database periodically.
  • Event log + background rollup — write an append-only event, process the log in background to produce the aggregate value. Scales to Netflix Distributed Counter shapes.
  • CRDT counters — PN-Counter or G-Counter for multi-region write convergence without coordination.

Seen in

  • — PlanetScale articulates the hot-counter instance on MySQL/InnoDB and presents the slotted-counter fix.

  • — Liz van Dijk (PlanetScale, 2022-09-08) canonicalises the hot-row problem as an explicit benchmark-workload design target. TAOBench's objects + edges schema (concepts/social-graph-objects-and-edges) is deliberately chosen to simulate viral-content scenarios: "Focusing the workload around these two simplified concepts allows the benchmark to simulate typical 'hot row' scenarios that can be particularly challenging for relational databases to handle. Think of what happens when something goes viral: a thundering herd of users comes through to interact with a specific piece of content posted somewhere. On the database level, beyond a sudden surge in connections, this can also translate into various types of locks centered around the backing rows for that piece, which can have rippling effects that ultimately translate to slower content access times for the users on the platform." TAOBench is the first benchmark on this wiki that measures substrate behaviour under hot-row pressure by design, as distinct from sysbench-tpcc's shard-key-aligned access pattern (which has no hot rows by construction).

Last updated · 542 distilled / 1,571 read