PATTERN Cited by 1 source
WAL before lakehouse publish¶
Definition¶
WAL before lakehouse publish is the pattern of interposing a latency-optimized write-ahead log between the ingestion endpoint and the final lakehouse storage layer (e.g., Delta tables). The WAL provides the low-latency durability guarantee and client acknowledgement, while the downstream commit to the lakehouse (which may involve transaction coordination, file compaction, metadata updates) proceeds asynchronously.
Why it exists¶
Lakehouse writes (Delta, Iceberg) are not instantaneous — they involve:
- Acquiring transaction locks or optimistic concurrency checks
- Writing Parquet data files
- Committing metadata (Delta log / manifest)
- Potentially triggering compaction or liquid clustering
If a streaming service waited for a full lakehouse commit before acknowledging the producer, end-to-end latency would be unacceptable for real-time workloads. The WAL decouples the durability SLO (milliseconds) from the queryability SLO (seconds).
Mechanism (Zerobus instantiation)¶
- Producer pushes data via gRPC bidirectional stream.
- Zerobus writes to a latency-optimized WAL.
- Once durable in the WAL, Zerobus returns the highest committed offset on the stream (async ack loop).
- Client purges its in-flight buffer up to that offset.
- Asynchronously, Delta Kernel Rust reads from the WAL and commits to Delta tables.
This mirrors the classic WAL commit-before-ack invariant applied at the service boundary rather than within a single database.
Trade-offs¶
| Advantage | Cost |
|---|---|
| Low-latency ack (ms-level durability) | Data not immediately queryable in lakehouse |
| Producer buffer can be freed quickly | Two-phase durability (WAL → Delta) adds complexity |
| Decouples ingestion rate from commit rate | WAL must be sized for burst → drain mismatch |
Seen in¶
- systems/zerobus-ingest — latency-optimized WAL with async offset acks via gRPC bidirectional streaming; Delta Kernel Rust for final commit (Source: sources/2026-06-11-databricks-ingesting-the-milky-way-petabyte-scale-with-zerobus-ingest)
Related¶
- concepts/wal-write-ahead-logging — the underlying durability primitive
- systems/delta-lake — the lakehouse target
- systems/delta-kernel — Rust library performing the final commit
- patterns/stream-connection-as-ordering-unit — the ordering layer that feeds the WAL