Skip to content

CONCEPT Cited by 1 source

Sub-topology (Kafka Streams)

A sub-topology in systems/kafka-streams is a connected component of the topology DAG under the graph of shared operators and state stores. It is the unit Kafka Streams uses to (a) assign tasks and (b) reason about co-partitioning across topics.

How sub-topologies form

Kafka Streams derives sub-topologies implicitly when the application is built. Two source nodes (topic consumers) that share any downstream operator, state store, or sink end up in the same sub-topology. Two source nodes with no shared operator or state between them — even when declared on the same StreamsBuilder — form separate sub-topologies.

This means an application can unintentionally split itself across sub-topologies by virtue of its operator graph topology, without the developer noticing. The shape of the processor DAG — not the number of topics or the partition counts — is what the assignor sees.

What sub-topologies decide

  1. Task granularity. One task per (sub-topology, partition-index).
  2. Partition colocation guarantee. Topics in the same sub-topology with the same partition count are co-partitioned — partition i of topic A and partition i of topic B go to the same task, hence the same instance.
  3. Repartition boundaries. Operators like repartition() / through() / some aggregations split the topology into consecutive sub-topologies connected via intermediate topics.
  4. State-store scope. A state store is owned by exactly one sub-topology; attaching it to processors in two places forces them into the same sub-topology.

Why it matters — the Expedia lesson

The core rule: Kafka Streams only guarantees partition colocation across topics if those topics are consumed within the same sub-topology and have the same number of partitions. Identical partition counts + identical key strategies across two topics are necessary but not sufficient. If the two topic consumers form separate sub-topologies, their partitions are assigned independently, and same-key → same-instance no longer holds across topics (Source: sources/2025-11-11-expedia-kafka-streams-sub-topology-partition-colocation).

Forcing sub-topology unification

The direct fix is to introduce a shared dependency between the two branches. The cleanest mechanism is a Kafka Streams state store attached to both processors — see patterns/shared-state-store-as-topology-unifier. Other routes:

  • Explicit join (KStream.join(...)) — co-partitions the inputs by construction.
  • Merge (KStream.merge(...)) — brings both streams into one downstream path.
  • Shared processor on both branches.

The why behind each is the same: give the assignor a reason to place the two topics in the same sub-topology.

Seen in

Last updated · 200 distilled / 1,178 read