PATTERN Cited by 1 source
Append-only log as substrate¶
Pattern¶
Build higher-level data-plane and control-plane systems on an append-only log (concepts/distributed-log) rather than on a mutable datastore. Producers append to the tail; consumers read from any point; the log is the authoritative record.
Canonical instance: Apache Kafka topics. Kafka's
partitions are logs; Kafka-on-top-of-Kafka (Kafka Streams, Kafka
Connect, __consumer_offsets, KRaft's __cluster_metadata) is the
"append-only log as substrate" pattern applied at multiple levels of
the same platform.
Why it works¶
Kozlovski's Kafka-101 framing:
"The data in the system is stored in topics. The fundamental basis of a topic is the log — a simple ordered data structure which stores records sequentially. […] It's immutable and has O(1) writes and reads (as long as they're from the tail or head). Therefore the speed of accessing its data doesn't degrade the larger the log gets and, due to its immutability, it's efficient for concurrent reads." (Source: sources/2024-05-09-highscalability-kafka-101)
The pattern's load-bearing properties:
- Decoupling producers and consumers. Producers don't wait for consumers; consumers don't coordinate with producers; many consumers can read the same log at their own pace. This is the feature Kafka used to displace the classical message-bus generation of systems (where consume deleted the message).
- Replayability. A consumer that crashes restarts from the last committed offset; a new consumer can start from the beginning or any chosen offset; a new application can derive a new view by replaying the log.
- Ordering semantics at controlled granularity. Total order per log (per Kafka partition); no total order across partitions — the ordering primitive is explicit and local.
- Alignment with HDD physics (see concepts/hdd-sequential-io-optimization). The log is linear; HDDs are fast for linear access; the pattern makes cheap storage fast.
Canonical instances within Kafka¶
- Topics — the base data plane.
__consumer_offsets— progress of consumer groups; a Kafka topic, meta-data-about-the-log in the log.__cluster_metadata— KRaft's cluster-metadata log (concepts/kraft-metadata-log). ZooKeeper was the prior mechanism; KRaft moves control-plane metadata into a log too.- Kafka Connect worker configuration, task state, and offsets — all stored in Kafka topics.
- Kafka Streams changelog topics for fault-tolerant state stores — the log is the durability substrate for derived state.
Four distinct subsystems at Kafka, all built on the same primitive.
Broader applicability¶
- Event sourcing. The pattern outside Kafka: persist the stream of domain events to a log, derive current state by folding over events.
- CDC transport. Debezium + Kafka Connect stream database changes into a log (see sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform).
- WAL shipping. Litestream / LiteFS ship WAL/LTX segments as a log to object storage (sources/2026-02-04-flyio-litestream-writable-vfs).
- Database replication. PostgreSQL logical replication is a log shipped to subscribers.
Seen in¶
- sources/2024-05-09-highscalability-kafka-101 — canonical statement of Kafka-as-log-substrate, including the log-for-metadata and log-for-progress recursions inside Kafka.
Related¶
- systems/kafka
- systems/kafka-connect — uses internal Kafka topics for config/state/offsets.
- systems/kafka-streams — uses Kafka topics as state-store changelogs.
- systems/kraft — KRaft's
__cluster_metadatais a Kafka log. - concepts/distributed-log — the primitive this pattern lifts.
- concepts/hdd-sequential-io-optimization — the physics win that makes the pattern cheap.
- concepts/kraft-metadata-log — canonical control-plane instance.
- patterns/leader-based-partition-replication — how the log is actually replicated in Kafka.