CONCEPT Cited by 1 source

Flush interval / freshness decoupling¶

Flush interval / freshness decoupling is the architectural property where the cadence at which a streaming system flushes data to a lakehouse tier (Iceberg/Parquet) is independent of the query freshness guarantee delivered to consumers. The freshness gap is covered by reading the most recent, un-flushed records directly from the streaming tier at query time.

The traditional coupling¶

In a conventional streaming-to-lakehouse pipeline, the flush interval is the freshness guarantee: - Flush every 30 seconds → dashboard shows data ≤30 s old, but produces tiny Parquet files (500 KB), poor compression, high S3 request cost, catalog bloat, and continuous compaction. - Flush every 2 hours → produces nicely-sized files (32+ MB) but the dashboard is 2 hours stale.

How decoupling works¶

A bridge-query-capable engine reads recent records from the topic at query time, so the flush cadence can be set for optimal file layout (large row groups, good compression, minimal S3 GETs, healthy Iceberg metadata) rather than for freshness. The topic covers the gap (Source: sources/2026-06-23-redpanda-bridge-queries-in-redpanda-sql).

Practical rule of thumb¶

throughput × lag target ≈ target file size

A topic doing 100 KiB/s with a 1-hour lag target and a 64 MiB flush threshold will hit the size limit and produce analytics-friendly files. The same topic with a 1-minute lag produces small files no matter how high the flush threshold.

Seen in¶

systems/redpanda-sql — bridge queries cover the freshness gap
sources/2026-06-23-redpanda-bridge-queries-in-redpanda-sql — primary source