Skip to content

CONCEPT Cited by 1 source

Kafka partition

Definition

A Kafka partition is the unit of parallelism, ordering, and replication in Apache Kafka. Every topic is split into N partitions; each partition is an ordered, append-only log (concepts/distributed-log); each partition is replicated R times (replication factor) into R replicas across different brokers. Records within a partition are ordered by a monotonic offset; records across partitions are not ordered with respect to each other.

Physical layout — replica is the storage unit

Kozlovski's Kafka-101 framing:

"A simple analogy is that just how the basic storage unit in an operating system is a file, the basic storage unit in Kafka is a replica (of a partition). Each replica is nothing more than a few files itself, each of which embody the log data structure and sequentially form a larger log. Each record in the log is denoted by a specific offset, which is simply a monotonically-increasing number." (Source: sources/2024-05-09-highscalability-kafka-101)

So for a topic with partition count P and replication factor R, Kafka stores P × R replicas across the broker fleet, each replica is a sequence of segment files on one broker's disk, and each record is addressed by (topic, partition, offset).

What the partition primitive buys

  • Parallelism — producers distribute records across partitions (by key-hash or round-robin); consumers in a group divide partitions among themselves. Throughput scales with partition count.
  • Ordering semantics — ordering is per-partition only. Total ordering across a topic requires partition count 1; any horizontal scale requires giving up that ordering. This is the core trade-off.
  • Replication unit — replication is scoped to the partition, not to the topic — different partitions of the same topic can have different leaders on different brokers.

Record-key → partition assignment

Keyed records hash to a partition index (hash(key) % partition_count), so records with the same key land on the same partition. This is the foundation that sub-topology-scoped colocation in Kafka Streams is built on (see sources/2025-11-11-expedia-kafka-streams-sub-topology-partition-colocation).

Seen in

Last updated · 319 distilled / 1,201 read