PATTERN Cited by 1 source
Flush cadence for file layout, not freshness¶
Set the streaming-to-Iceberg flush interval to produce optimally-sized Parquet files (32–100+ MB) rather than to meet the analytics freshness SLA, because a bridge-query-capable engine covers the freshness gap by reading un-flushed records directly from the topic at query time.
Pattern shape¶
┌─────────────────────────────────────────────┐
│ Traditional: flush every 30s for freshness │
│ → 500 KB files, constant compaction │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│ With bridge query: flush every 1–6 hours │
│ → 32–64 MB files, no compaction needed │
│ → Topic covers the freshness gap │
└─────────────────────────────────────────────┘
When to apply¶
- You have a streaming-to-lakehouse pipeline landing data in Iceberg.
- Your analytics freshness requirement is sub-minute but your optimal file size demands multi-hour flush intervals.
- Your query engine supports bridge queries or equivalent transparent two-tier reads.
Configuration (Redpanda)¶
- Raise lag target:
rpk topic alter-config orders --set redpanda.iceberg.target.lag.ms=3600000 - (Optional) Raise flush threshold:
rpk cluster config set datalake_translator_flush_bytes 67108864 - Rule of thumb:
throughput × lag ≈ target file size
Benefits¶
- Fewer S3 GETs per query (10 × 100 MB vs 1000 × 1 MB, same bytes scanned)
- Better compression ratios from larger column-encoding windows
- Less Iceberg catalog metadata bloat
- Effective concepts/predicate-pushdown from substantial row groups
- Compaction service can run rarely or not at all
Seen in¶
- systems/redpanda-sql — bridge queries make this pattern viable
- sources/2026-06-23-redpanda-bridge-queries-in-redpanda-sql — primary source
Related patterns¶
- patterns/transparent-hot-cold-tier-query — the query-side complement
- patterns/streaming-broker-as-lakehouse-bronze-sink — the ingestion substrate