Skip to content

PATTERN Cited by 1 source

Append-only log as substrate

Pattern

Build higher-level data-plane and control-plane systems on an append-only log (concepts/distributed-log) rather than on a mutable datastore. Producers append to the tail; consumers read from any point; the log is the authoritative record.

Canonical instance: Apache Kafka topics. Kafka's partitions are logs; Kafka-on-top-of-Kafka (Kafka Streams, Kafka Connect, __consumer_offsets, KRaft's __cluster_metadata) is the "append-only log as substrate" pattern applied at multiple levels of the same platform.

Why it works

Kozlovski's Kafka-101 framing:

"The data in the system is stored in topics. The fundamental basis of a topic is the log — a simple ordered data structure which stores records sequentially. […] It's immutable and has O(1) writes and reads (as long as they're from the tail or head). Therefore the speed of accessing its data doesn't degrade the larger the log gets and, due to its immutability, it's efficient for concurrent reads." (Source: sources/2024-05-09-highscalability-kafka-101)

The pattern's load-bearing properties:

  • Decoupling producers and consumers. Producers don't wait for consumers; consumers don't coordinate with producers; many consumers can read the same log at their own pace. This is the feature Kafka used to displace the classical message-bus generation of systems (where consume deleted the message).
  • Replayability. A consumer that crashes restarts from the last committed offset; a new consumer can start from the beginning or any chosen offset; a new application can derive a new view by replaying the log.
  • Ordering semantics at controlled granularity. Total order per log (per Kafka partition); no total order across partitions — the ordering primitive is explicit and local.
  • Alignment with HDD physics (see concepts/hdd-sequential-io-optimization). The log is linear; HDDs are fast for linear access; the pattern makes cheap storage fast.

Canonical instances within Kafka

  • Topics — the base data plane.
  • __consumer_offsets — progress of consumer groups; a Kafka topic, meta-data-about-the-log in the log.
  • __cluster_metadata — KRaft's cluster-metadata log (concepts/kraft-metadata-log). ZooKeeper was the prior mechanism; KRaft moves control-plane metadata into a log too.
  • Kafka Connect worker configuration, task state, and offsets — all stored in Kafka topics.
  • Kafka Streams changelog topics for fault-tolerant state stores — the log is the durability substrate for derived state.

Four distinct subsystems at Kafka, all built on the same primitive.

Broader applicability

Seen in

Last updated · 319 distilled / 1,201 read