Skip to content

CONCEPT Cited by 1 source

Partition colocation (cross-topic)

Partition colocation (cross-topic) is the property that same-index partitions of multiple topics are assigned to the same processing instance / task. Topic-A partition 0 and Topic-B partition 0 end up on the same host so that a record with key K in either topic is handled by the same node — which is the enabling condition for:

  • Cross-topic cache locality — one local cache keyed by K sees the same keyspace for both streams (Expedia's use case).
  • Cross-topic deduplication without a distributed store.
  • Cross-topic joins implemented at the application layer (not the framework's built-in join) — e.g. one processor that updates local state from topic A and reads from topic B.
  • Single-instance ordering guarantees across related topics for any keyspace-scoped invariant.

Necessary conditions

  1. Identical partition count across the topics.
  2. Same key strategy — both topics partition by the same logical key (so same K → same index via hash).
  3. Consumer-framework-level assignment that places same-index partitions on the same instance. This is the condition that can silently fail.

In plain systems/kafka — with hand-rolled consumers — condition (3) is the developer's responsibility (pin the two consumers to the same partition on the same host). In systems/kafka-streams, the framework handles assignment — but only within the scope of a sub-topology.

The Kafka Streams rule

Kafka Streams only guarantees partition colocation across topics if those topics are consumed within the same sub-topology and have the same number of partitions.

(Source: sources/2025-11-11-expedia-kafka-streams-sub-topology-partition-colocation.)

Two topic consumers that are structurally disjoint — no shared operator or state store between them — form separate sub-topologies; their partitions are assigned independently, and cross-topic colocation breaks. The partition-assignor's co-partitioning check is per-sub-topology, not per-application.

The Expedia failure mode

  • Two topics, same partition count, similar keys.
  • Two independent processing paths; no shared state.
  • Local Guava cache on each instance expected to dedupe an expensive external-API call across both streams.
  • In production: identical keys landed on different instances; each maintained its own cache; the API was called redundantly.
  • Structural root cause: the app had formed two sub-topologies by construction.

The fix

Attach a shared Kafka Streams state store to both branches (patterns/shared-state-store-as-topology-unifier). Kafka Streams detects the shared dependency, merges the two sub-topologies into one unified topology, and the assignor then places same-index partitions of both topics on the same task — colocation restored, cache locality restored, duplicate API calls eliminated.

Relationship to other colocation concepts

  • Colocation sharding ("colos") is the database-side cousin: group tables that share a shard key into a physical colocation so that transactions and joins scoped to a single shard-key value land on one shard. Kafka Streams' cross-topic partition colocation is the same idea projected onto a stream-processing substrate — pick a partition key, then ensure everything that cares about that key lives on one node.
  • concepts/locality-aware-scheduling is the scheduler-side form: place tasks where their input data already lives. Kafka Streams' sub-topology co-partitioning is static (assignor plans it on rebalance) rather than dynamic.

Seen in

Last updated · 200 distilled / 1,178 read