PATTERN

Deterministic key hash for partition affinity¶

Problem. A Kafka consumer receives a stream of aggregate-shaped messages (counts, sums, mergeable sketches) and wants to merge messages that share the same logical key before writing to downstream storage — reducing write volume. But the consumer only sees one partition at a time, so messages with the same key spread across partitions can't be coalesced without cross-partition coordination.

Solution. Set the Kafka message key to a deterministic hash of the logical key (usually a tuple of tenant ID + other group-by columns). Kafka's default partitioning guarantees same-key-same-partition-per-topic — every message for a given logical key lands on the same partition. A single consumer reading that partition then sees all the messages for that key in its batch buffer and can coalesce them locally, with no cross-consumer coordination.

(Source: .)

Mechanism¶

Pick the logical key. The tuple of columns the downstream aggregation groups by — (tenant_id, fingerprint) for Insights, (user_id, event_type) for an event aggregator, (shard_id, metric_name) for a metrics fanin.
Hash the tuple deterministically. Any hash that keeps identical tuples on the same partition works (MD5, xxHash, FNV); what matters is determinism, not cryptographic strength.
Set the Kafka message key to the hash. Kafka's default producer uses hash(key) % partition_count to pick a partition — this gives every logical key a stable partition-affinity.
Consumer coalesces in-memory. Each consumer reads one partition, groups messages in its batch buffer by the same logical key, and writes one row per group to downstream storage.

Canonical PlanetScale application¶

Rafer Hazen, 2023-08-10: "Aggregate query data is mapped to Kafka partitions by setting the Kafka key to a deterministic hash of the database identifier and the query fingerprint. Because of this, all messages for a given database/query pattern will arrive in the same partition and we can merge aggregate Kafka messages in memory for each consumer batch to avoid unnecessary database writes. In practice, we've found that in-memory coalescing decreases database writes by about 30%–40%."

Load-bearing properties¶

Correctness under rebalance. The key-partition mapping is determined by Kafka's partitioner, not by which consumer instance is assigned to a partition. A consumer rebalance reassigns partitions to different consumers but preserves the key→partition mapping. Logical keys stay on their partition through a rebalance.
Self-reinforcing under load. Bigger batches coalesce more (concepts/in-memory-coalescing-by-kafka-key): a backlog spike that drives batch size from 200 to 1,000 yields a higher coalesce rate, which reduces DB write pressure, which lets the consumer burn down the backlog faster. The system naturally sheds write amplification when it's most needed.
No cross-consumer coordination. Consumers operate on independent partitions. No shared state, no locks, no Raft.

Composition with ordering guarantees¶

Kafka guarantees in-order delivery within a partition. For aggregate telemetry this is not critical — commutative + associative merges (sum, count, sketch-merge) give the same answer regardless of merge order. But for sequential- order-sensitive consumers (state-machine replays, event sourcing) the same partition-affinity pattern does double duty: same-key-same-partition also gives you same-key-in- order.

When this fails¶

One tenant dominates. A single hot key produces a hot partition. Insights' per-interval unique-pattern cap at the VTGate instrumentation site bounds the worst case.
Re-partition at rest changes mapping. Changing partition count changes the hash(key) % N assignment. Consumers must either be paused during re-partition or handle the temporary same-key-two- partitions window via idempotent writes.
Consumer batch budget exceeded. If a single batch has to hold all messages for all active keys before flush, the consumer's memory budget bounds the effective coalesce window. PlanetScale's ~200-message typical batch + up-to-1000-in-spike budget is sized to the per-shard write rate (5k writes/s ÷ ~20 Hz ≈ 250 messages/batch average).

Comparison to producer-side aggregation¶

An alternative is aggregating on the producer side (VTGate in PlanetScale's case) before emitting to Kafka. PlanetScale's VTGate already does this — they emit one aggregate message per fingerprint per 15-second interval. The consumer-side coalesce is second-order aggregation on top: merging 15-second aggregates into minute / hour rollups. The two-layer aggregation is natural because the per-15s-per-VTGate-instance aggregate is still cardinality-high when a pattern is served by many VTGate instances.

Seen in¶

— canonical wiki disclosure. Hazen canonicalises the partition-affinity-for-aggregation-locality pattern with a 30-40% measured DB-write reduction.