CONCEPT
Hot-row problem¶
The hot-row problem is the data-shape pattern where one row in a table receives a disproportionate fraction of the workload's writes (or reads) — typically because it represents a frequently-accessed shared entity (counter, leaderboard entry, global config flag, popular product SKU). On a row-locking engine like InnoDB, hot rows trigger row-level lock contention that serialises all concurrent writers.
Canonical example: the hot counter¶
"It is a common database pattern to increment an INT
column when an event happens, such as a download or page
view. You can go far with this pattern until bursts of
these types of events happen in parallel and you experience
contention on a single row."
(Source: .)
The pattern maps to SQL like:
Under low concurrency, latency is fine. Under a traffic
burst against one entity, every writer takes an X record
lock on the same row, waits for the previous writer to
release, and the write workload serialises.
Distinguishing hot-row from hot-key / hot-partition¶
| Term | Where it hurts | Typical symptom |
|---|---|---|
| Hot row | Single-DB relational engines (InnoDB, Postgres) | Lock contention, deadlock risk |
| Hot key | Distributed KV stores (DynamoDB, Cassandra) | Per-partition throttling; uneven shard load |
| Hot partition | Sharded systems | One shard CPU-bound while others idle |
| Wide partition | Cassandra-family | One partition's data grows unbounded; compaction / read cost balloon |
The hot-row problem is the relational-OLTP instance. The underlying data-skew shape (one entity is disproportionately popular) is the same as the hot-key and hot-partition problems; the mechanism of failure differs because the storage engine differs.
Why it's common in practice¶
- Product surfaces create shared counters. Per-video view count, per-repository download count, per-post like count, per-seller review count. The entity is shared across many viewers or contributors.
- Bursts are the norm, not the exception. Viral events, launches, fire-sales, and scheduled campaigns create bursts of writes to a single entity.
- Schema is simple. One row per entity is the obvious modelling choice; the hot-row problem only surfaces under load.
Mitigations¶
The fix depends on the engine and the cost tolerance:
- Slotted counter
— split the hot row into
Nrows keyed by aslotcolumn, pick a random slot per write, sum on read. Canonical MySQL/InnoDB fix from GitHub'sgithub.downloadsworkload. - Shard replication for hot keys — replicate the hot row across multiple shards, route writes by hash of caller, sum on read. Equivalent at the sharding layer.
- Out-of-database aggregation — increment in Redis / Memcached, flush to the OLTP database periodically.
- Event log + background rollup — write an append-only event, process the log in background to produce the aggregate value. Scales to Netflix Distributed Counter shapes.
- CRDT counters — PN-Counter or G-Counter for multi-region write convergence without coordination.
Seen in¶
-
— PlanetScale articulates the hot-counter instance on MySQL/InnoDB and presents the slotted-counter fix.
-
— Liz van Dijk (PlanetScale, 2022-09-08) canonicalises the hot-row problem as an explicit benchmark-workload design target. TAOBench's
objects+edgesschema (concepts/social-graph-objects-and-edges) is deliberately chosen to simulate viral-content scenarios: "Focusing the workload around these two simplified concepts allows the benchmark to simulate typical 'hot row' scenarios that can be particularly challenging for relational databases to handle. Think of what happens when something goes viral: a thundering herd of users comes through to interact with a specific piece of content posted somewhere. On the database level, beyond a sudden surge in connections, this can also translate into various types of locks centered around the backing rows for that piece, which can have rippling effects that ultimately translate to slower content access times for the users on the platform." TAOBench is the first benchmark on this wiki that measures substrate behaviour under hot-row pressure by design, as distinct fromsysbench-tpcc's shard-key-aligned access pattern (which has no hot rows by construction).