Skip to content

CONCEPT Cited by 2 sources

Keyed partitioner

Definition

A keyed partitioner is the Kafka / Redpanda producer-side partitioning strategy used when a record has a non-null key: the partition is chosen deterministically by partition = hash(key) mod N, where N is the topic's partition count. Records with the same key always land on the same partition (for a given partition count), which preserves per-key ordering end-to-end.

Contrasts with the sticky partitioner, which applies only to unkeyed records and rotates between partitions for batching efficiency.

Why it exists — ordering guarantee

Kafka partitions are append-only ordered logs. Records within a partition are strictly ordered by offset; records across partitions are not ordered with respect to each other. When a workload requires ordering at the key granularity — e.g. "process updates for user 42 in the order they happened" — the only way to get that ordering through Kafka is to ensure all updates for user 42 land on the same partition.

Hashing the key is the mechanism. As long as key-to-partition mapping is stable (N unchanged), records with the same key co-locate.

Canonical use cases

Redpanda names CDC as the headline justification (Source: sources/2025-04-23-redpanda-need-for-speed-9-tips-to-supercharge-redpanda):

"Only use keyed partitioners when strictly necessary according to the application requirements (CDC use cases are a good example of this)."

CDC semantics require per-row ordering: a row's INSERT must be processed before its UPDATE, which must be processed before its DELETE. Scattering these three events across partitions would let a consumer process the UPDATE before the INSERT, which breaks downstream state machines. Hashing by primary key routes all events for a row to the same partition, preserving per-row ordering.

Other common justified uses:

  • Per-user event streams — login / click / purchase events for a user processed in order for session reconstruction.
  • Per-device telemetry — sensor readings processed in order for anomaly detection.
  • Per-account financial transactions — balance updates processed in order for correctness.

Why it's dangerous — batch-size dilution + skew

Two distinct failure modes that the sticky partitioner specifically avoids:

Batch-size dilution

With the keyed partitioner, every produce call goes to the partition chosen by the hash. A single producer sending to many keys hits many partitions per second. Each partition sees only 1/N of the producer's record rate, so each partition's batch fills more slowly — effective-batch-size collapses.

From Kinley 2024-11-19 part 1:

"Increased partition count doesn't affect batch size when the sticky partitioner is in use since that partitioner writes all messages to a single partition (read: batch) until a size threshold is reached based on batch.size."

The implicit corollary: with the keyed partitioner, increased partition count does reduce batch size.

Partition skew

Keyed partitioning is correct only when the keys are uniformly distributed across the key space. In practice, real-world key distributions are heavily skewed — one tenant is 10× bigger than the rest, one device sends 100× more messages, one customer id dominates the dataset. See concepts/partition-skew-data-skew for the full framing.

Mitigation: pick high- cardinality keys when keys are unavoidable — canonicalised as patterns/high-cardinality-partition-key.

The three-partitioner decision tree

Workload Partitioner
No ordering contract, plain event stream Sticky (default in Kafka 2.4+)
Per-key ordering required (CDC, per-user / per-device / per-entity event stream) Keyed, with high-cardinality key
Per-key ordering required but key cardinality is low Keyed, but accept skew OR manually compose a composite key to inflate cardinality

The post's summary recommendation:

"Use the uniform-sticky partitioner whenever possible to balance writes over all partitions. Only used keyed partitioners when strictly necessary according to the application requirements."

Composite-key trick for low-cardinality workloads

When the natural key is low-cardinality (e.g. tenant_id with 50 tenants) but ordering is required per that key, a common trick is to compose a higher-cardinality key that preserves ordering:

synthetic_key = tenant_id + ":" + shard_id
where shard_id = hash(row_pk) mod K

This spreads each tenant's traffic across K partitions while still preserving ordering within (tenant_id, shard_id). Downstream consumers that need per-tenant ordering now have to re-merge K streams — a cost, but often worth the producer-side relief.

Seen in

Last updated · 470 distilled / 1,213 read