PATTERN Cited by 1 source
High-cardinality partition key¶
Problem¶
A streaming workload requires per-key ordering (e.g. CDC per row, per-user event ordering, per-device telemetry). That forces use of the keyed partitioner — records with the same key must co-locate on the same partition to preserve ordering.
But keyed partitioning is hash-based, which distributes records uniformly only if key values are diverse enough to fill the hash space. When the natural key has low cardinality, or when some key values are disproportionately frequent, records concentrate on a few partitions — producing partition skew with the resulting Amdahl's-Law cap on throughput.
Solution¶
When keyed partitioning is required, choose a partition key with the highest possible cardinality and as uniform a frequency distribution as possible. Redpanda's verbatim guidance (Source: sources/2025-04-23-redpanda-need-for-speed-9-tips-to-supercharge-redpanda):
"If keys are essential, and you have the option, try to pick keys that have the highest cardinality (those with the highest number of distinct values) for a good chance of distributing the keys well over the number of partitions."
Cardinality heuristics (adapted from shard-key cardinality):
- Strong: key cardinality ≥ 100× partition count.
user_idin a 100M-user system across 128 partitions — ~1M keys per partition, smooth distribution. - Weak: key cardinality ≥ 10× partition count. Still workable but sensitive to frequency skew.
- Broken: key cardinality ≤ partition count. Some partitions are guaranteed to receive zero traffic regardless of partitioner quality.
Examples by data shape¶
Natural high-cardinality keys — prefer these¶
- Primary keys (
user_id,order_id,device_id,row_pkin a relational system). Guaranteed unique per row. - Natural IDs (UUID, snowflake, nanoid). Cardinality = row count.
- Compound keys where each component adds orthogonal
dimensions:
(user_id, session_id).
Low-cardinality keys — avoid or compose¶
country_code(~200 values). Caps distribution at 200 partitions; skew toward dominant countries (US, CN, IN).tenant_idin multi-tenant where one whale tenant dominates.subscription_tier(~5 values). Trivially broken.status(~3-10 values). Trivially broken.
Composite-key trick for low-cardinality ordering¶
When ordering is required by a low-cardinality column (e.g.
tenant_id for per-tenant ordering) but traffic is skewed:
This spreads each tenant's traffic across K synthetic keys,
preserving (tenant_id, row_pk)-level ordering within each
stream while breaking the single-partition bottleneck at the
tenant level. Downstream consumers that need strict per-tenant
ordering must re-merge K streams in order, which is typically
worth the producer-side throughput win.
Detection¶
Partition skew is invisible at aggregate-cluster metrics; watch per-partition rates:
- Per-partition produce rate
(
vectorized_kafka_partition_leader_msgsor equivalent). - Per-partition byte rate.
- Per-partition consumer lag (concepts/kafka-consumer-lag-metric).
A topic where 90th-percentile partition throughput is substantially higher than 50th-percentile has skew. Investigate the key distribution.
When this pattern doesn't apply¶
- Workloads that don't require per-key ordering — use the sticky partitioner instead. No key means no skew from key-frequency distribution.
- Workloads with a single hot key by design (e.g. all records for a session ordered together). Accept the single-partition bottleneck; throughput ceiling is fundamental to the contract.
- Composite keys that break ordering — the trick above requires a consumer capable of re-merging K streams. Simple consumers reading one partition at a time lose the strict tenant-level ordering.
Relationship to sharding¶
This is the streaming analog of shard-key cardinality in database sharding. Same primitive (pick a high-cardinality column to spread rows uniformly across physical partitions); same failure mode (low-cardinality keys concentrate rows on few shards, breaking horizontal scalability); same mitigations (composite keys, synthetic keys derived from primary keys).
Seen in¶
- sources/2025-04-23-redpanda-need-for-speed-9-tips-to-supercharge-redpanda — canonical wiki source. High-cardinality-key rule as the third mitigation for partition skew.
Related¶
- systems/kafka, systems/redpanda — Kafka-API brokers.
- concepts/keyed-partitioner — the partitioner this pattern applies to.
- concepts/partition-skew-data-skew — the failure mode this pattern mitigates.
- concepts/shard-key-cardinality — adjacent primitive in database sharding.
- concepts/sticky-partitioner — the alternative when keys aren't required.
- concepts/hot-key — the finer-grain manifestation of key skew.
- concepts/kafka-partition — the unit being targeted.