CONCEPT Cited by 2 sources

Partition skew (data skew)¶

Definition¶

Partition skew — also known as data skew — is the streaming-broker failure mode where records are distributed unevenly across a topic's partitions, so some partitions see dramatically higher record rates, byte rates, or consumer lag than others. Under skew, adding more partitions stops helping: the scarce resource becomes the single hottest partition, not aggregate cluster capacity.

Amdahl's Law for streaming¶

Redpanda canonicalises the framing (Source: sources/2025-04-23-redpanda-need-for-speed-9-tips-to-supercharge-redpanda):

"Think of this as the data equivalent of Amdahl's Law: data skew is the enemy of parallelization, limiting the benefits of scaling out by using more partitions. If 90% of your data goes through a single partition, then whether you have 10 partitions or 50 won't really make a difference since that single overworked partition is your limiting factor."

Amdahl's Law (parallel computing): if fraction α of a workload is inherently serial, maximum speedup is 1 / α regardless of core count. A workload that is 10% serial caps at 10× speedup no matter how many cores you add.

Applied to streaming: if fraction α of records hash to a single hot partition P, then aggregate topic throughput is capped at throughput(P) / α. Scaling the partition count from N to 2N only helps the non-skewed 1-α fraction; P (and therefore the bottleneck) doesn't split. More partitions don't help when the hot partition is already the bottleneck.

Where skew comes from¶

The partition for a record is chosen by the client-side partitioner:

Unkeyed records — partitioned by the sticky partitioner (Kafka default since 2.4) or classic round-robin. Neither produces skew — rotation or sticky-window batching spreads records uniformly.
Keyed records — partitioned by hash(key) mod N. Skew arises when key frequencies are non-uniform — some keys appear in many more records than others. If one key dominates, all its records concentrate on one partition.

Common real-world key-skew sources:

Tenant id in a multi-tenant system where one tenant is orders of magnitude larger (whale tenant).
Sensor id / device id in IoT where most devices are low-traffic but a few are chatty.
Customer id in B2B e-commerce where enterprise customers dwarf SMB.
Null key (legacy code paths) hashing to partition 0.

Three-pronged mitigation¶

Redpanda's post enumerates three operator responses:

Use the sticky partitioner whenever possible. For workloads that don't require keyed partitioning, sticky distributes records across partitions in batch-sized chunks — guaranteed uniform long-term distribution.
Only use keyed partitioners when strictly required. The post names CDC as the canonical justification — per-row ordering matters, so records for the same row must land on the same partition. Most application workloads don't actually need ordering at the key level.
When keys are required, pick high- cardinality keys. Verbatim:

"If keys are essential, and you have the option, try to pick keys that have the highest cardinality (those with the highest number of distinct values) for a good chance of distributing the keys well over the number of partitions." Canonicalised as the pattern patterns/high-cardinality-partition-key.

Detection¶

Partition skew is invisible at the aggregate-cluster-metrics altitude. You need per-partition visibility — specifically:

Per-partition incoming record rate (from vectorized_kafka_partition_leader_msgs or equivalent).
Per-partition byte rate (vectorized_kafka_partition_leader_bytes).
Per-partition consumer lag (from the consumer-group offsets).

A topic with 90th-percentile partition throughput ≫ 50th-percentile is skewed. A topic where one partition has consistently growing lag while others are caught up is consumer-side skewed even if producer-side distribution was even (a slow consumer on one partition).

Why consumer skew compounds¶

The post notes:

"If your application doesn't use partitions evenly, it could start to bottleneck on skewed partitions — if not at the producers, then perhaps at the consumers, which can then lead to processing imbalances at the broker level."

Three-layer compounding: uneven produce ⇒ uneven partition sizes ⇒ uneven consumer work ⇒ uneven broker fetch-request patterns ⇒ noisy-neighbor-style contention on whichever broker leads the hot partitions.

Relationship to other skew concepts¶

concepts/hot-key — key-level hotspot; partition skew is key-hotspot's partition-level manifestation.
Sharding has the same problem at a different granularity — skewed shard keys produce hot shards. The mitigation (high-cardinality shard keys) is the same primitive.
concepts/shard-key-cardinality — the cardinality threshold heuristic (10× to 100× more key values than shards) applies equally to partitioning: a key column with 50 distinct values cannot distribute across 128 partitions.

Seen in¶

sources/2026-05-20-databricks-virtue-foundation-medical-volunteers-72-countries — Spark / Photon batch-altitude instance. Virtue Foundation's VF Match Foundational Data Refresh hit the curse-of-the-last-reducer variant of partition skew on Splink pairwise-comparison Spark partitions: 30 minutes worst-case partition vs 52 seconds median (~35× ratio) — verbatim "The core of record linkage is pairwise comparison, which creates inherently skewed workloads: common comparisons produce massive partitions while most others remain much smaller." Mitigation was vectorised execution rather than partition redistribution: enabling Photon (Databricks' vectorised query engine) cut the worst-case partition from 30 minutes to ~2 minutes — a 15× improvement. The Photon path collapses per-record work cost via SIMD column-major loops, eliminating JVM per-row dispatch overhead on the hot partition. Contrast with the Redpanda Kafka-broker streaming-altitude framing below: streaming partition skew is mitigated upstream (sticky / high-cardinality keys), batch partition skew at the reduce stage is mitigated by per-record work-cost reduction or skew-join optimisation.
sources/2025-04-23-redpanda-need-for-speed-9-tips-to-supercharge-redpanda — canonical wiki source. Amdahl's-Law framing; three-pronged mitigation (sticky partitioner / keyed-only-when-required / high-cardinality keys); consumer-compounding argument.

systems/kafka, systems/redpanda — Kafka-API brokers where the partition is the unit of parallelism.
concepts/kafka-partition — the unit being skewed.
concepts/sticky-partitioner — first-line mitigation for unkeyed workloads.
concepts/keyed-partitioner — when keyed is required.
concepts/shard-key-cardinality — the cardinality threshold heuristic at the sharding granularity.
concepts/hot-key — finer-grain hotspot framing.
concepts/horizontal-sharding — same problem, different granularity.
patterns/high-cardinality-partition-key — the operational pattern for mitigation 3.