CONCEPT Cited by 3 sources
The log is the truth, the database is a cache¶
Definition¶
The truth is the log. The database is a cache of a subset of the log. A framing — originating in Kleppmann's CIDR 2015 paper "Turning the database inside out" — that inverts the usual database-first mental model: the authoritative source of record for an organisation's data is a replayable, immutable, append-only log of events; every database, search index, cache, feature store, or warehouse downstream is a materialised view of some subset of that log.
Canonical citation on the wiki¶
Alex Gallego (Redpanda founder) cites this framing as the explicit premise for starting Redpanda (Source: Gallego 2025-04-03):
"The systems community implemented a re-playable log of events with microbatching. Think of it as microservices, consuming and producing to stable APIs like RabbitMQ, Apache Kafka®, Redpanda. When people were done with it, it felt like turning a database inside out, where most businesses looked like control plane databases or simply views of the log."
"The truth is the log. The database is a cache of a subset of the log."
"I started Redpanda with the premise that batch was a historical artefact due to a lack of mental tools and, in part, a lack of intuitive industrial streaming implementations that offer a new way of reasoning about the world."
Consequences of the inversion¶
When the log is authoritative and the database is derived:
- Schema evolution is log-first. Schema changes become events in the log, not schema migrations against a master store; downstream materialisations pick up the new shape when they're ready.
- Point-in-time reconstruction is free. Any view can be rebuilt by replaying from offset 0; backfills and new consumers are architecturally symmetric.
- Systems become views. ClickHouse, Snowflake, Elasticsearch, Redis, Postgres read replicas — all are "cached projections" of some topic-level subset.
- Operational boundary flips. The log is the durability substrate; downstream stores are optimisation / query-shape concerns. This is why streaming brokers historically wanted to be the system of record, not a buffer in front of one.
Two-layer materialisation: log + lakehouse¶
The 2020s version of the inversion adds a second axis: not just databases-as-cache-of-log, but Apache Iceberg tables as cold-storage projection of the log, with the hot streaming topic as the real-time tail. Iceberg topics — where a single logical entity is simultaneously a Kafka topic and an Iceberg table — is the structural realisation of this: you get the log-as-truth + lakehouse- as-query-engine with no external ETL (patterns/streaming-broker-as-lakehouse-bronze-sink).
Relationship to continuous computation¶
If the log is truth, then "batch" is just continuous computation observed through a temporal window. Gallego's next move — canonicalised as concepts/continuous-computation-convergence — is that modern query engines (Databricks, Snowflake, BigQuery) absorb the batch-vs-streaming complexity once they can read the same Iceberg table, so you use the lakehouse for backfill and the low-latency stream for tailing.
Seen in¶
- sources/2025-04-03-redpanda-autonomy-is-the-future-of-infrastructure — Gallego's founder-voice citation of Kleppmann's 2015 paper as the premise for Redpanda, framed across the batch → streaming → continuous-computation trajectory.
- sources/2025-10-28-redpanda-governed-autonomy-the-path-to-enterprise-agentic-ai — 2025-10-28 ADP companion post applies the framing at the agent-interaction altitude. Every agent interaction (prompt + input + context retrieval + tool call + output + action) becomes a first-class durable event on the streaming log; the log is the truth for agent decisions, and every governance surface (audit, lineage, replay, SLO enforcement, end-to-end tracing) is a view over the log. See patterns/durable-event-log-as-agent-audit-envelope.
- sources/2026-03-30-redpanda-under-the-hood-redpanda-cloud-topics-architecture — applies the framing inside the broker's own storage architecture. On a Cloud Topic the Raft log of placeholder batches is the source of truth for what records exist; the object-storage payload is addressable cache. If a placeholder commits but the L0 file is lost, the broker knows a record existed at that offset but can't serve its bytes — operationally equivalent to cache miss on unrecoverable data. If placeholder replication fails, the record "doesn't exist" even if the bytes are in S3. Canonical in-broker instance of the Kleppmann framing.
Related¶
- systems/kafka
- systems/redpanda
- systems/redpanda-connect
- systems/redpanda-cloud-topics — canonical in-broker instance (placeholder-batch log = truth, S3 L0/L1 files = cache).
- concepts/continuous-computation-convergence
- concepts/iceberg-topic
- concepts/placeholder-batch-metadata-in-raft — the mechanism realising "log is truth, object store is cache" inside the broker.
- patterns/streaming-broker-as-lakehouse-bronze-sink
- patterns/object-store-batched-write-with-raft-metadata — the write-path pattern embodying this framing in the broker's own storage.