CONCEPT Cited by 1 source

L0/L1 file compaction for object-store streaming¶

Definition¶

L0/L1 file compaction is the two-tier object-storage file layout used by streaming systems that write primary record data directly to object storage: L0 (Level 0) files are optimised for ingest speed and PUT-cost economics (cross-partition coalescing); a background Reconciler process rewrites L0 data into L1 (Level 1) files optimised for historical reads (per-partition colocation, offset-sorted, much larger). This mirrors the LSM-tree L0 → L1 terminology but operates at the object-storage-file granularity rather than at the in-memory- memtable / on-disk-SSTable granularity.

The layout exists because write-optimal and read-optimal object- storage file shapes are different — and both matter.

Canonical wiki instance¶

Introduced from the 2026-03-30 Redpanda Cloud Topics architecture deep-dive.

L0 files — ingest-optimised¶

"We batch incoming data in memory for a short window defined by time (e.g., 0.25 seconds) or size (e.g., 4MB). We collect this data across all partitions and topics simultaneously. We do this specifically to minimize the cost of object storage; by aggregating smaller writes into larger batches, we significantly reduce the number of PUT requests sent to S3."

"Upload: We flush this batch directly to cloud object storage. We call this an L0 (Level 0) File."

Key properties:

Multi-partition, multi-topic — one L0 file carries data for many partitions and many topics, whatever landed in that batch window.
Single PUT per batch window — minimises S3/GCS/ADLS PUT request cost (see concepts/small-file-problem-on-object-storage).
Low write latency — no in-broker sort / reorganisation before upload.

L1 files — read-optimised¶

"The Reconciler continuously optimizes the storage layout. It reads the L0 files and reorganizes the data, grouping messages that belong to the same partition and writing them into L1 (Level 1) Files."

"L1 Files are: * Much larger: Optimized for high-throughput object storage reading. * Co-located: All data for a specific partition range is physically together. * Sorted: Organized by offset."

Key properties:

Per-partition colocation — each L1 file holds contiguous data for a "specific partition range", not the mixed-partition shape of L0.
Offset-sorted — enables seek-then-stream on historical reads without repeated lookups.
Much larger — fewer, larger files optimise for object- storage GET throughput (per-GET latency is dominated by per-GET overhead, not byte rate; bigger files amortise it).

Garbage collection¶

"Once L0 data is successfully moved into L1, it's eligible for garbage collection and will eventually be removed."

L0 files are short-lived: write-once, read-possibly-on- cache-miss, delete-after-compaction. L1 files are long-lived and governed by the topic's retention policy.

Why this layout exists: the scattered-read problem¶

If a system wrote L0 files only, tailing consumers would be fine (they'd be hitting the memory cache, see concepts/last-reconciled-offset), but historical consumers would face a scattered-read problem:

"However, if a consumer falls behind and needs to read from storage (a cache miss), reading from L0 can be inefficient. Because L0 files contain data from many different partitions batched together, reading a single partition's history would require 'scattered reads' across many different files."

To read 1 GB of one partition's history, the consumer might have to fetch the same portion of a hundred different L0 files, discarding the 99% that belongs to other partitions. The L1 rewrite eliminates this cost by physically colocating a partition's data.

Relationship to the small-file problem¶

The L0 layout directly addresses the write side of the small-file problem: many brokers × many partitions × frequent flush would produce a flood of tiny PUTs if each partition uploaded independently. Multi-partition coalescing in a single L0 file is the write-path mitigation.

The L1 rewrite addresses the read side of the same axis from the opposite direction: L0 files at per-batch-window size (≤4 MB) would still be small for high-throughput historical reads. The L1 rewrite concatenates many L0 slices into "much larger" per-partition files — fewer, bigger GETs.

Relationship to LSM compaction¶

The L0/L1 terminology is borrowed from LSM trees, but the compaction semantics are different:

Aspect	LSM L0 → L1	Cloud Topics L0 → L1
Unit	In-memory memtable → on-disk SSTable	Object-storage file → object-storage file
Trigger	Memtable full	Periodic background Reconciler
Dedup / merge	Key-level merge, tombstone resolution	Partition-level regrouping, offset sort
Amplification	Write amplification across levels	Compaction-to-compaction rewrite of all bytes once
Ordering	Key order (row sort)	Offset order (time-of-arrival sort)

Cloud Topics' L0/L1 is best thought of as a reshuffle of already-ordered per-partition data rather than a key-merge compaction.

Metadata storage¶

"Metadata for L1 files are stored in a shared metadata tier that's backed by an internal topic and a key-value store. This ensures that the system maintains a robust, consistent view of where your optimized data resides. This includes updating metadata as the underlying data is rewritten by compaction, and removed as the retention policy kicks in."

L0 metadata lives in the per-partition Raft log (as placeholder batches); L1 metadata lives in the shared metadata tier. The split matches the cadence of state change: L0 state changes per-produce (fast, high-volume), L1 state changes per-compaction or per-retention (slow, low-volume).

Seen in¶

sources/2026-03-30-redpanda-under-the-hood-redpanda-cloud-topics-architecture — canonical wiki source introducing the L0/L1 file layout for object-store-primary streaming.

systems/redpanda-cloud-topics — the canonical production implementation.
systems/redpanda — the broker.
systems/aws-s3 — the object-storage substrate whose per-PUT pricing + per-GET overhead shape the L0/L1 trade-offs.
concepts/small-file-problem-on-object-storage — the problem L0 batching solves on write; the problem L1 compaction solves on read.
concepts/placeholder-batch-metadata-in-raft — the per-produce metadata side of the same design.
concepts/last-reconciled-offset — the per-partition watermark routing reads between L0 and L1.
concepts/batching-latency-tradeoff — 0.25 s / 4 MB window is the ingest-side lever.
concepts/iceberg-snapshot-expiry — analogous lifecycle-management on the lakehouse side.
concepts/compression-compaction-cpu-cost — the CPU-side framing of background rewrite work.
patterns/background-reconciler-for-read-path-optimization — the broader pattern family.
patterns/object-store-batched-write-with-raft-metadata — the write-path pattern that produces L0 files.