Bridge Queries in Redpanda SQL¶

Summary¶

This post introduces bridge queries — a Redpanda SQL feature that lets a single SQL statement transparently read from both an Iceberg table (historical, cold data) and the underlying Redpanda topic (live, hot data) via one virtual table. The architectural insight is that bridge queries decouple the Iceberg flush interval from the freshness requirement: the topic covers the freshness gap at query time, so the flush cadence can be optimized purely for Parquet file layout (large, well-compressed files) rather than for data recency.

Key takeaways¶

Bridge queries unify two storage tiers in one SQL statement. The engine routes historical rows from Iceberg and recent rows from the topic; no UNION, no dual-write complexity, no client awareness of tiers. The query author sees a single table.
The flush interval is no longer driven by freshness. Previously, to keep analytics dashboards fresh to ≤1 minute, you had to flush from topic to Iceberg every minute — producing tiny Parquet files. With bridge queries, the topic itself covers the recency gap, so you can flush every 1–6 hours for optimal file sizes.
Larger Parquet files are dramatically more efficient. Benefits cited: better column-encoding compression ratios, fewer S3 GET requests per query, less Iceberg catalog metadata overhead, more effective predicate pushdown (row group statistics work better on substantial row groups), and far less compaction work.
The stitching is deduplication-free. Because the Redpanda broker itself performed the Iceberg conversion, it has authoritative knowledge of what data is in which tier. The bridge is a simple concatenation, not a dedup join — which keeps it high-performance.
Configuration is two knobs. redpanda.iceberg.target.lag.ms (per-topic, controls how long before data must reach Iceberg) and datalake_translator_flush_bytes (cluster-wide, size threshold for flushing a Parquet file). Rule of thumb: throughput × lag ≈ target file size.
Partition cardinality must stay low. The default partition spec is (hour(redpanda.timestamp)). Higher cardinality (minute-level, high-cardinality field) multiplies file count and negates the large-flush benefit. The post recommends day(redpanda.timestamp) for most workloads.
Quantitative example. Going from 500 KB files every 30 seconds to 32 MB files a few times per hour yields visible improvements in query latency and S3 cost.
Availability note. Bridge queries require Redpanda SQL (GA for BYOC AWS in 2026-05-27). Self-managed Enterprise availability targeted for late 2026 / early 2027.

Systems and concepts extracted¶

Systems¶

systems/redpanda-sql — the Postgres-wire MPP query engine that executes bridge queries
systems/redpanda-iceberg-topics — the Iceberg Topics substrate providing dual-tier storage
systems/apache-iceberg — the open table format used for the cold tier
systems/apache-parquet — the columnar file format Iceberg Topics write

Concepts¶

concepts/bridge-query — the specific construct: one SQL statement spanning topic + Iceberg
concepts/flush-interval-freshness-decoupling — the architectural unlock: flush for file layout, not freshness
concepts/small-file-problem-on-object-storage — the pathology bridge queries help avoid
concepts/two-tier-stream-iceberg-query-bridge — the broader concept this concretises
concepts/iceberg-partition-cardinality — keeping file count low by limiting partition spec granularity
concepts/predicate-pushdown — effective only with substantial row groups (large files)
concepts/iceberg-topic — the dual-nature topic that makes bridge queries possible

Patterns¶

patterns/transparent-hot-cold-tier-query — the query-side pattern
patterns/flush-cadence-for-file-layout-not-freshness — the storage-side pattern enabled by bridge queries
patterns/low-partition-cardinality-for-large-files — operational guidance for Iceberg partition specs

Operational numbers¶

Metric	Value
Default lag target	1 min (`redpanda.iceberg.target.lag.ms`)
Recommended lag target for bridge queries	1 hour (3,600,000 ms)
Default translator flush size	32 MiB
Optional raised flush size	64 MiB
Small file example	500 KB every 30 s
Large file example	32 MB every few hours
Throughput rule of thumb	throughput × lag ≈ target file size
Default partition spec	`(hour(redpanda.timestamp))`
Self-managed availability	Late 2026 / early 2027

Caveats¶

Bridge queries are available only on Redpanda SQL (BYOC AWS, GA since 2026-05-27). Self-managed support is months out.
The post does not disclose bridge-query latency overhead vs pure-Iceberg queries.
No discussion of consistency semantics (read-after-write guarantees) for the bridge stitching.
The "deduplication-free" claim depends on Redpanda being the sole writer to both tiers; if external writers push to the Iceberg table, guarantees may differ.

Source¶

Original: https://www.redpanda.com/blog/bridge-queries-in-redpanda-sql
Raw markdown: raw/redpanda/2026-06-23-bridge-queries-in-redpanda-sql-1b6c02e8.md