CONCEPT Cited by 1 source
Partition colocation (cross-topic)¶
Partition colocation (cross-topic) is the property that
same-index partitions of multiple topics are assigned to the
same processing instance / task. Topic-A partition 0 and
Topic-B partition 0 end up on the same host so that a record
with key K in either topic is handled by the same node — which
is the enabling condition for:
- Cross-topic cache locality — one
local cache keyed by
Ksees the same keyspace for both streams (Expedia's use case). - Cross-topic deduplication without a distributed store.
- Cross-topic joins implemented at the application layer (not the framework's built-in join) — e.g. one processor that updates local state from topic A and reads from topic B.
- Single-instance ordering guarantees across related topics for any keyspace-scoped invariant.
Necessary conditions¶
- Identical partition count across the topics.
- Same key strategy — both topics partition by the same logical
key (so same
K→ same index via hash). - Consumer-framework-level assignment that places same-index partitions on the same instance. This is the condition that can silently fail.
In plain systems/kafka — with hand-rolled consumers — condition (3) is the developer's responsibility (pin the two consumers to the same partition on the same host). In systems/kafka-streams, the framework handles assignment — but only within the scope of a sub-topology.
The Kafka Streams rule¶
Kafka Streams only guarantees partition colocation across topics if those topics are consumed within the same sub-topology and have the same number of partitions.
(Source: sources/2025-11-11-expedia-kafka-streams-sub-topology-partition-colocation.)
Two topic consumers that are structurally disjoint — no shared operator or state store between them — form separate sub-topologies; their partitions are assigned independently, and cross-topic colocation breaks. The partition-assignor's co-partitioning check is per-sub-topology, not per-application.
The Expedia failure mode¶
- Two topics, same partition count, similar keys.
- Two independent processing paths; no shared state.
- Local Guava cache on each instance expected to dedupe an expensive external-API call across both streams.
- In production: identical keys landed on different instances; each maintained its own cache; the API was called redundantly.
- Structural root cause: the app had formed two sub-topologies by construction.
The fix¶
Attach a shared Kafka Streams state store to both branches (patterns/shared-state-store-as-topology-unifier). Kafka Streams detects the shared dependency, merges the two sub-topologies into one unified topology, and the assignor then places same-index partitions of both topics on the same task — colocation restored, cache locality restored, duplicate API calls eliminated.
Relationship to other colocation concepts¶
- Colocation sharding ("colos") is the database-side cousin: group tables that share a shard key into a physical colocation so that transactions and joins scoped to a single shard-key value land on one shard. Kafka Streams' cross-topic partition colocation is the same idea projected onto a stream-processing substrate — pick a partition key, then ensure everything that cares about that key lives on one node.
- concepts/locality-aware-scheduling is the scheduler-side form: place tasks where their input data already lives. Kafka Streams' sub-topology co-partitioning is static (assignor plans it on rebalance) rather than dynamic.
Seen in¶
- sources/2025-11-11-expedia-kafka-streams-sub-topology-partition-colocation — Expedia production case: rule named, failure mode described, shared-state-store fix demonstrated.
Related¶
- concepts/sub-topology
- systems/kafka-streams
- concepts/cache-locality
- patterns/shared-state-store-as-topology-unifier
- patterns/colocation-sharding — DB-side analog
- concepts/locality-aware-scheduling — scheduler-side analog