CONCEPT Cited by 2 sources
Keyed partitioner¶
Definition¶
A keyed partitioner is the Kafka / Redpanda producer-side
partitioning strategy used when a record has a non-null key:
the partition is chosen deterministically by
partition = hash(key) mod N, where N is the topic's
partition count. Records with the same key always land on the
same partition (for a given partition count), which preserves
per-key ordering end-to-end.
Contrasts with the sticky partitioner, which applies only to unkeyed records and rotates between partitions for batching efficiency.
Why it exists — ordering guarantee¶
Kafka partitions are append-only ordered logs. Records within a partition are strictly ordered by offset; records across partitions are not ordered with respect to each other. When a workload requires ordering at the key granularity — e.g. "process updates for user 42 in the order they happened" — the only way to get that ordering through Kafka is to ensure all updates for user 42 land on the same partition.
Hashing the key is the mechanism. As long as key-to-partition mapping is stable (N unchanged), records with the same key co-locate.
Canonical use cases¶
Redpanda names CDC as the headline justification (Source: sources/2025-04-23-redpanda-need-for-speed-9-tips-to-supercharge-redpanda):
"Only use keyed partitioners when strictly necessary according to the application requirements (CDC use cases are a good example of this)."
CDC semantics require per-row ordering: a row's INSERT must be
processed before its UPDATE, which must be processed before its
DELETE. Scattering these three events across partitions would
let a consumer process the UPDATE before the INSERT, which
breaks downstream state machines. Hashing by primary key
routes all events for a row to the same partition, preserving
per-row ordering.
Other common justified uses:
- Per-user event streams — login / click / purchase events for a user processed in order for session reconstruction.
- Per-device telemetry — sensor readings processed in order for anomaly detection.
- Per-account financial transactions — balance updates processed in order for correctness.
Why it's dangerous — batch-size dilution + skew¶
Two distinct failure modes that the sticky partitioner specifically avoids:
Batch-size dilution¶
With the keyed partitioner, every produce call goes to the
partition chosen by the hash. A single producer sending to many
keys hits many partitions per second. Each partition sees only
1/N of the producer's record rate, so each partition's batch
fills N× more slowly — effective-batch-size collapses.
From Kinley 2024-11-19 part 1:
"Increased partition count doesn't affect batch size when the sticky partitioner is in use since that partitioner writes all messages to a single partition (read: batch) until a size threshold is reached based on
batch.size."
The implicit corollary: with the keyed partitioner, increased partition count does reduce batch size.
Partition skew¶
Keyed partitioning is correct only when the keys are uniformly distributed across the key space. In practice, real-world key distributions are heavily skewed — one tenant is 10× bigger than the rest, one device sends 100× more messages, one customer id dominates the dataset. See concepts/partition-skew-data-skew for the full framing.
Mitigation: pick high- cardinality keys when keys are unavoidable — canonicalised as patterns/high-cardinality-partition-key.
The three-partitioner decision tree¶
| Workload | Partitioner |
|---|---|
| No ordering contract, plain event stream | Sticky (default in Kafka 2.4+) |
| Per-key ordering required (CDC, per-user / per-device / per-entity event stream) | Keyed, with high-cardinality key |
| Per-key ordering required but key cardinality is low | Keyed, but accept skew OR manually compose a composite key to inflate cardinality |
The post's summary recommendation:
"Use the uniform-sticky partitioner whenever possible to balance writes over all partitions. Only used keyed partitioners when strictly necessary according to the application requirements."
Composite-key trick for low-cardinality workloads¶
When the natural key is low-cardinality (e.g. tenant_id with
50 tenants) but ordering is required per that key, a common
trick is to compose a higher-cardinality key that preserves
ordering:
This spreads each tenant's traffic across K partitions while
still preserving ordering within (tenant_id, shard_id).
Downstream consumers that need per-tenant ordering now have to
re-merge K streams — a cost, but often worth the producer-side
relief.
Seen in¶
- sources/2025-04-23-redpanda-need-for-speed-9-tips-to-supercharge-redpanda — canonical wiki source. Names CDC as the canonical justification; three-pronged partitioning guidance; the high-cardinality-key discipline for when keys are unavoidable.
- sources/2024-11-19-redpanda-batch-tuning-in-redpanda-for-optimized-performance-part-1 — canonicalises the batch-size-dilution argument against keyed partitioning (factor 4 of the seven-factor effective- batch-size framework).
Related¶
- systems/kafka, systems/redpanda — Kafka-API brokers.
- concepts/sticky-partitioner — the alternative for unkeyed records; retires the batch-dilution concern when keys aren't needed.
- concepts/kafka-partition — the unit a keyed partitioner chooses.
- concepts/partition-skew-data-skew — the failure mode keyed partitioning creates.
- concepts/change-data-capture — canonical use case.
- concepts/shard-key-cardinality — why high-cardinality keys are needed.
- concepts/hot-key — the worst-case skew manifestation.
- patterns/high-cardinality-partition-key — the operational pattern.