SYSTEM Cited by 3 sources
Redpanda SQL¶
Redpanda SQL is Redpanda's Postgres-protocol query engine that runs inside the customer's Redpanda BYOC cluster and lets a single SQL statement query both live Redpanda streaming topics and historical Apache Iceberg tables without any ETL pipeline or connector fleet. The engine is built on MPP technology from Oxla, a C++ analytical engine Redpanda acquired in 2025; Redpanda SQL is the GA productisation of that engine inside the Redpanda Data Platform.
General Availability: 2026-05-27 for Redpanda BYOC on AWS, consumption-based plans only (per Redpanda SQL is GA). Private preview / activation flow disclosed mid-December 2025 in the 2025-10-28 ADP launch post (Gallego founder-voice). GCP BYOC and BYOVPC support: "coming soon". Self-Managed deployment: targeted for 2H FY27.
Product page: redpanda.com/sql. Activation docs: docs.redpanda.com/cloud-data-platform/sql.
Architecture¶
┌────────────────────────────────────────┐
│ Customer VPC (BYOC) │
│ │
psql / DBeaver │ ┌──────────────────────────────┐ │
DataGrip / SQL │──▶│ Redpanda SQL (Oxla MPP) │ │
Studio ─Postgres│ │ - C++ engine │ │
wire │ │ - Postgres wire protocol │ │
│ │ - In-place reads only │ │
│ └──────────────────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────┐ ┌───────────────┐ │
│ │ Redpanda │ │ Iceberg │ │
│ │ brokers │ │ tables in │ │
│ │ (live tier) │ │ S3 / GCS │ │
│ │ │ │ (cold tier) │ │
│ └──────────────┘ └───────────────┘ │
│ ▲ │
│ │ Iceberg Topics dual-write │
│ │ (broker is producer for │
│ │ both tiers simultaneously)│
└────────────────────────────────────────┘
Four load-bearing properties (verbatim where attributed):
- In-cluster, in-VPC. "Redpanda SQL runs on the same infrastructure as your brokers, inside your VPC, and every query accesses data in-place, in both the hot (stream) and cold (Iceberg table) tiers. Nothing is sent to a third-party compute service." The data doesn't move; the analytical compute moves to the data. Canonical wiki instance of concepts/in-cluster-streaming-sql and patterns/in-vpc-query-engine-on-streaming-substrate.
- Postgres wire protocol. "It speaks Postgres. You connect with psql, DBeaver, DataGrip, or the SQL Studio built into Redpanda Console — whatever you already have open." No new drivers, no new client SDK, no new query language. Inherits the entire Postgres-driver ecosystem across every language for free. Canonical wiki instance of concepts/postgres-wire-protocol-as-streaming-sql-surface — protocol compatibility = ecosystem inheritance, applied to the analytical surface (the same architectural move Redpanda made for the broker side via Kafka-wire protocol).
- Transparent two-tier bridge. "If you're using Redpanda Iceberg Topics, which store your streaming data in both a live tier and a Parquet/Iceberg cold tier in S3 or GCS simultaneously, Redpanda SQL bridges the two tiers transparently. The engine figures out an optimized read path across both. (And you don't have to care.)" A single SQL statement reads across both tiers; the consumer doesn't decide which tier holds which records. Canonical wiki instance of concepts/two-tier-stream-iceberg-query-bridge and patterns/transparent-hot-cold-tier-query. Note: this depends on the Iceberg Topics simultaneous-write substrate (broker writes records to both the log and Parquet/Iceberg) — not on a Lambda-architecture batch + stream merge on the consumer side.
- MPP execution from Oxla. "Redpanda SQL is built on MPP (Massively Parallel Processing) engine technology from Oxla, which Redpanda acquired in 2025. Like Redpanda Streaming, the engine is written in C++. It was designed to run analytical workloads at scale with extreme memory efficiency, amplifying OLAP query throughput, and it's managed entirely by Redpanda." Same C++ implementation language as the streaming broker; same thread-per-core / Seastar lineage at the runtime altitude. Oxla's compute-storage-separated architecture and Postgres wire protocol carry forward. The Oxla brand surfaces only in the activation flow; users see "Redpanda SQL".
What it queries¶
A topic in Redpanda is a table in Redpanda SQL — the namespace collapse is the load-bearing simplification: "Your topics are tables. You write SQL. You get results."
Two sources of records:
- Live broker tier — records currently in the broker's log segments, including records that arrived three milliseconds ago.
- Iceberg cold tier — Parquet files in S3 or GCS, registered in the Iceberg catalog (REST or file-based). Includes records that arrived three years ago.
The engine plans a unified read path across both tiers. "Either way: same table, same query, same endpoint, same result." The bridging is invisible to the SQL writer.
What's NOT shipped at GA¶
The launch post explicitly contrasts Redpanda SQL with predefined- query streaming SQL substrates:
- No materialised views to predefine. "There are no materialized views to predefine, nor a proprietary storage tier that shields data from other tools. No streaming pipelines to build before the data arrives."
- No proprietary storage tier. Data lives in standard Iceberg
- standard Redpanda topics; any other Iceberg-compatible engine can also read the cold tier.
- No new drivers / no new query language / no SDK. "If your team writes SQL today, your apps, humans, and agents can query Redpanda tomorrow."
- No streaming pipelines. Direct contrast with Flink / Spark jobs that have to be built before query results are available. "Connect a client, write a query, get results."
The post explicitly names ksqlDB as a competitive foil at the predefined-vs-ad-hoc axis: "ksqlDB is a handy tool, but it requires you to decide what questions you're going to ask before the events arrive, which requires a level of foresight that most data quality problems, incident postmortems, or agent-driven analytics work suggest we do not actually have." Canonicalised on the wiki as concepts/ad-hoc-vs-predefined-streaming-sql.
Workloads (named at GA launch)¶
Five canonical workloads from the launch post:
1. Streaming-app debugging¶
The wiki-canonical example: "SELECT * FROM orders WHERE status =
'failed' AND timestamp > NOW() - INTERVAL '30 minutes'. Results
in seconds. Faster mean time to resolve (MTTR). Incident closed."
Replaces the "Option B" incident-response workflow of dumping
Kafka topics to JSON and running Python filtering scripts or
spinning up ad-hoc Spark jobs. The launch post's Option C becomes
the canonical comparison: "40 minutes doing work that a SQL query
should do in four seconds."
2. Real-time operational analytics¶
Fraud models, recommendation engines, live leaderboards, inventory systems, intrusion detection — "production applications that make a decision on every event, as it arrives." The structural argument is freshness gap closure: warehouse queries see minutes-old data; Redpanda SQL queries see milliseconds-old data. The fraud / personalisation / intrusion-detection examples are all decisions whose value collapses when the freshness gap is non-zero.
3. Ad-hoc analytics¶
Analyst queries against live + historical data in one statement. The launch positioning: "Redpanda SQL queries live topic data and Iceberg history in the same statement, giving analysts an up-to-the-millisecond view without waiting for the pipeline to catch up."
4. Compliance queries / data-residency-bound analytics¶
"Regulated data that cannot egress to an external SaaS provider can now be queried directly within your VPC, without procuring a separate query engine or moving data across providers, regions, or network zones. The data stays in your environment. Your data doesn't need to travel to be queryable." Extends BYOC data ownership from the storage tier to the analytical-compute tier; the data residency story now covers the full query path.
5. AI agents (the headline target)¶
"A human analyst writes one query, reads the result, writes another. An AI agent fans out across dozens of tables and writes hundreds of queries simultaneously: comparing time windows, validating patterns, and exploring hypotheses in parallel. Agents need data that arrived seconds ago to make good decisions — not a pipeline snapshot from several minutes ago that may no longer reflect what's actually happening." Canonical wiki instance of concepts/agent-driven-query-fan-out — the structural reframing of analytical query workloads as the consumer transitions from human-paced serial questioning to agent-paced parallel hypothesis exploration.
Place in the Redpanda Data Platform¶
Redpanda SQL completes the platform's three-product lineup:
| Product | What it does |
|---|---|
| Redpanda Streaming | Streams data — high-throughput, low-latency event streaming (Kafka API) |
| Redpanda Connect | Connects streams to other systems — 300+ connectors, filter, enrich, route |
| Redpanda SQL (this page) | Makes data queryable — ad hoc SQL against live and historical Iceberg data |
Verbatim positioning: "One architecture. One operational model. One vendor. And one fewer conversation about which tool handles the analytics layer." This is Redpanda's positioning answer to Kora + Flink + Tableflow (Confluent) and to Kafka + ETL + Snowflake (the assembly model).
Activation flow¶
For existing Redpanda BYOC AWS customers on consumption-based plans:
- "It's just three steps and no cluster restart. You do not need to open a ticket. Just log in to Redpanda Cloud and follow the setup flow."
- Activation surfaces the "Oxla" brand once during the flow; product UI is "Redpanda SQL".
- No new cluster needed; SQL engine deploys onto the existing BYOC infrastructure.
The two-decision activation has two implicit elements: consent to the Oxla branding surfacing once during setup, and promotion of topics to tables (the documented step in the docs.redpanda.com walkthrough — surface a topic into the SQL namespace before querying).
Relationship to Oxla (sibling page)¶
systems/oxla holds the engine substrate page — acquisition history, MPP / C++ / Postgres-wire / compute-storage- separation properties, federated-query positioning. Redpanda SQL (this page) holds the GA product page — productised deployment in BYOC, integration with Iceberg Topics, three-step activation, workload positioning.
The relationship: Oxla is the engine technology; Redpanda SQL
is the productisation. Users interact with "Redpanda SQL"; the
"Oxla" brand appears only in the activation flow. The 2025-10-28
ADP launch post disclosed Oxla and rpk oxla CLI; the 2026-05-27
GA post canonicalises "Redpanda SQL" as the user-facing product
name.
Caveats & open questions (from 2026-05-27 GA disclosure)¶
- No quantitative numbers. No latency p-values, throughput benchmarks, scale tests, or comparison numbers vs Snowflake / Databricks / Trino / BigQuery / Flink SQL on equivalent Iceberg substrates.
- GA scope narrow. AWS BYOC consumption-plan only. GCP BYOC, BYOVPC, and Self-Managed are roadmap items.
- Read-path mechanism not detailed. The post repeats "the engine figures out an optimized read path across both" without naming the routing primitive (per partition, per offset range, per timestamp, per snapshot freshness).
- Cross-tier transactionality not specified. Isolation level for queries spanning live broker records and Iceberg snapshot records is not disclosed.
- SQL feature coverage unspecified. Postgres wire protocol ≠ Postgres SQL feature coverage. Window functions, CTEs, JOIN types, time-window operators, JSON/vector/full-text-search extensions, streaming-specific operators (TUMBLE, HOP, SESSION) are unaddressed.
- DML / DDL semantics unspecified. Whether INSERT / UPDATE / DELETE / CREATE TABLE work, whether queries are read-only, and what schema-evolution semantics interact with Iceberg topic schema evolution is not addressed.
- Materialised views absent at GA. Roadmap intent unclear.
- Multi-cluster federation scope at GA. The 2025-10-28 ADP post framed Oxla as supporting "federated queries spanning Apache Iceberg, Apache Kafka topics, and a broad suite of legacy data sources" — whether this is GA-shipped or roadmap is not clarified.
- Compute isolation between SQL and streaming workloads not addressed. SQL runs on the same BYOC infrastructure; back- pressure / noisy-neighbor / per-tenant resource shaping unspecified.
- Cost model. Consumption-based plan, no $/GB-scanned or $/query disclosed.
Source¶
- Original (GA launch): https://www.redpanda.com/blog/redpanda-sql-ga
- Sibling Oxla announcement (2025-10-28): https://www.redpanda.com/blog/introducing-the-agentic-data-plane
Seen in¶
- 2026-05-27 — sources/2026-05-27-redpanda-redpanda-sql-is-ga-the-query-engine-that-skips-the-pipeline — GA launch post; canonical disclosure of in-cluster + Postgres-wire + transparent-two-tier + MPP-from-Oxla properties + five-workload framing.
- 2025-10-28 — sources/2025-10-28-redpanda-introducing-the-agentic-data-plane — Oxla acquisition + ADP launch + early-preview disclosure (mid-December 2025); first disclosure that the SQL substrate would be the third pillar of the Redpanda Data Platform.
- 2025-10-28 — sources/2025-10-28-redpanda-governed-autonomy-the-path-to-enterprise-agentic-ai — companion Governed autonomy post with federated-query positioning across Iceberg + Kafka topics + legacy sources.
Related¶
- Engine substrate: systems/oxla
- Streaming substrate: systems/redpanda · systems/redpanda-iceberg-topics · systems/redpanda-byoc
- Sibling platform products: systems/redpanda-connect · systems/redpanda-agents-sdk · systems/redpanda-agentic-data-plane
- Wire protocol substrate: systems/postgresql
- Substrate format: systems/apache-iceberg
- Competitive foils: systems/apache-flink · systems/apache-spark · systems/snowflake · systems/databricks · systems/google-bigquery (ksqlDB also named in body, no wiki page yet)
- Concepts: concepts/in-cluster-streaming-sql · concepts/two-tier-stream-iceberg-query-bridge · concepts/postgres-wire-protocol-as-streaming-sql-surface · concepts/agent-driven-query-fan-out · concepts/ad-hoc-vs-predefined-streaming-sql · concepts/zero-etl-operational-analytical · concepts/iceberg-topic · concepts/byoc-data-ownership-for-iceberg · concepts/compute-storage-separation
- Patterns: patterns/in-vpc-query-engine-on-streaming-substrate · patterns/transparent-hot-cold-tier-query
- Company: companies/redpanda