PATTERN Cited by 1 source

Flush cadence for file layout, not freshness¶

Set the streaming-to-Iceberg flush interval to produce optimally-sized Parquet files (32–100+ MB) rather than to meet the analytics freshness SLA, because a bridge-query-capable engine covers the freshness gap by reading un-flushed records directly from the topic at query time.

Pattern shape¶

┌─────────────────────────────────────────────┐
│  Traditional: flush every 30s for freshness │
│  → 500 KB files, constant compaction        │
└─────────────────────────────────────────────┘

┌─────────────────────────────────────────────┐
│  With bridge query: flush every 1–6 hours   │
│  → 32–64 MB files, no compaction needed     │
│  → Topic covers the freshness gap           │
└─────────────────────────────────────────────┘

When to apply¶

You have a streaming-to-lakehouse pipeline landing data in Iceberg.
Your analytics freshness requirement is sub-minute but your optimal file size demands multi-hour flush intervals.
Your query engine supports bridge queries or equivalent transparent two-tier reads.

Configuration (Redpanda)¶

Raise lag target: rpk topic alter-config orders --set redpanda.iceberg.target.lag.ms=3600000
(Optional) Raise flush threshold: rpk cluster config set datalake_translator_flush_bytes 67108864
Rule of thumb: throughput × lag ≈ target file size

Benefits¶

Fewer S3 GETs per query (10 × 100 MB vs 1000 × 1 MB, same bytes scanned)
Better compression ratios from larger column-encoding windows
Less Iceberg catalog metadata bloat
Effective concepts/predicate-pushdown from substantial row groups
Compaction service can run rarely or not at all

Seen in¶

systems/redpanda-sql — bridge queries make this pattern viable
sources/2026-06-23-redpanda-bridge-queries-in-redpanda-sql — primary source

patterns/transparent-hot-cold-tier-query — the query-side complement
patterns/streaming-broker-as-lakehouse-bronze-sink — the ingestion substrate