Skip to content

REDPANDA 2026-05-27

Read original ↗

Redpanda — Redpanda SQL is GA: the query engine that skips the pipeline

A Redpanda Blog launch post (2026-05-27) announcing the General Availability of Redpanda SQL — a Postgres-protocol query engine that runs inside the customer's Redpanda BYOC cluster and lets a single SQL statement query both live Redpanda streaming topics and historical Apache Iceberg tables without any ETL pipeline or connector fleet. Tier-3 source (vendor blog, marketing- oriented framing) included on borderline-architecture-content grounds: the post discloses the engine's structural shape (in-VPC deployment, MPP/C++ from the Oxla acquisition, Postgres protocol, transparent two-tier read path bridging the Iceberg-topic live tier and Parquet/Iceberg cold tier), the positioning vs Flink / Spark / ksqlDB / warehouse-side ingest, and the agent-driven fan-out workload the platform is targeting.

This is the GA materialisation of the Oxla acquisition first disclosed on 2025-10-28 in the Agentic Data Plane launch post and the Governed autonomy companion. Mid-December 2025 preview → 2026-05-27 GA for Redpanda BYOC AWS customers on consumption- based plans; GCP BYOC, BYOVPC, and Self-Managed deployment scoped for later.

Summary

Redpanda SQL ships the third pillar of the Redpanda Data Platform"Streaming, Connect, and SQL" — completing what was historically a three-vendor stack (broker + connector layer + warehouse / engine) into one cluster. The GA architecture has four load-bearing properties: (1) In-cluster engine: SQL runs on the same BYOC infrastructure as the brokers and Iceberg storage, inside the customer's VPC, so "the data doesn't move" and "every query accesses data in-place, in both the hot (stream) and cold (Iceberg table) tiers." (2) Postgres wire protocol (via Oxla, the C++ MPP engine Redpanda acquired in 2025): clients connect with psql, DBeaver, DataGrip, or Redpanda Console's SQL Studio — "It's just Postgres." No new drivers, no new query language, no new SDK to install. (3) Transparent two-tier bridge: a single SQL statement reads across the live streaming tier and the Parquet/Iceberg cold tier of an Iceberg Topic; "the engine figures out an optimized read path across both. (And you don't have to care.)" (4) MPP execution: "built on MPP (Massively Parallel Processing) engine technology from Oxla… written in C++", designed for analytical-workload throughput and "extreme memory efficiency" over OLAP queries. Five named workloads anchor the launch: streaming-app debugging ("SELECT * FROM orders WHERE status = 'failed' AND timestamp > NOW() - INTERVAL '30 minutes'. Results in seconds"), real-time operational analytics (fraud, recommendations, leaderboards, inventory, intrusion detection), ad-hoc analytics (warehouse-side freshness gap on incident investigations), compliance queries (regulated data stays in-VPC, "no separate query engine to procure"), and agent-driven query fan-out (humans serial, agents parallel — "hundreds of queries simultaneously: comparing time windows, validating patterns, exploring hypotheses in parallel").

Key takeaways

  1. In-cluster MPP query engine on streaming + Iceberg substrate. Redpanda SQL is the first canonical wiki instance of an MPP analytical engine deployed alongside the streaming broker inside the customer's VPC — sibling to but architecturally distinct from external warehouse engines (Snowflake, Databricks, BigQuery) that pull data via ingestion pipelines, and from in-broker materialised-view engines (ksqlDB, Flink SQL, Materialize) that compile predefined queries. "Redpanda SQL runs on the same infrastructure as your brokers, inside your VPC, and every query accesses data in-place, in both the hot (stream) and cold (Iceberg table) tiers." Canonicalises in-cluster streaming SQL and patterns/in-vpc-query-engine-on-streaming-substrate.
  2. Transparent hot+cold tier bridge inside one SQL statement. The engine's signature property: "If you're using Redpanda Iceberg Topics, which store your streaming data in both a live tier and a Parquet/Iceberg cold tier in S3 or GCS simultaneously, Redpanda SQL bridges the two tiers transparently. The engine figures out an optimized read path across both." The data could have arrived three years ago or three milliseconds ago — "same table, same query, same endpoint, same result." Canonicalises concepts/two-tier-stream-iceberg-query-bridge and patterns/transparent-hot-cold-tier-query — a structural alternative to the Lambda-architecture "two paths cooperating" shape on the consumer side. The dual-tier substrate this depends on (Iceberg Topics' live + cold simultaneous write) was already canonicalised on the wiki (Source: sources/2025-04-07-redpanda-251-iceberg-topics-now-generally-available); what's new here is the in-cluster query engine that reads across both transparently.
  3. Postgres wire protocol as the universal SQL surface for streaming. "It speaks Postgres. You connect with psql, DBeaver, DataGrip, or the SQL Studio built into Redpanda Console — whatever you already have open." No new drivers, no new query language, no new client SDK. Inherits the entire Postgres-driver ecosystem across every language. Canonicalises concepts/postgres-wire-protocol-as-streaming-sql-surface — the analogue of how Kafka-wire-protocol compatibility on the broker side gave Redpanda its drop-in client compatibility. The same architectural move applied to the SQL surface: protocol compatibility = ecosystem inheritance for free.
  4. Built on Oxla — the MPP engine acquired in 2025. *"Redpanda SQL is built on MPP (Massively Parallel Processing) engine technology from Oxla, which Redpanda acquired in
  5. Like Redpanda Streaming, the engine is written in C++. It was designed to run analytical workloads at scale with extreme memory efficiency, amplifying OLAP query throughput, and it's managed entirely by Redpanda." This completes the Oxla → Redpanda SQL* GA arc disclosed Oct 2025 (2025-10-28 ADP launch); the integration is hidden behind the Redpanda Console product surface but the C++/MPP technology substrate is unchanged. The Oxla brand surfaces only in the activation flow.
  6. Ad hoc, not predefined — explicit foil to ksqlDB. "There are no materialized views to predefine, nor a proprietary storage tier that shields data from other tools. No streaming pipelines to build before the data arrives. ksqlDB is a handy tool, but it requires you to decide what questions you're going to ask before the events arrive, which requires a level of foresight that most data quality problems, incident postmortems, or agent-driven analytics work suggest we do not actually have. Redpanda SQL is fully ad hoc." Canonicalises concepts/ad-hoc-vs-predefined-streaming-sql — the structural axis between predefined-query streaming SQL (ksqlDB, Flink SQL on the streaming side) and ad-hoc streaming SQL (Redpanda SQL, Materialize follow-up queries). Both classes have legitimate use; the launch post argues that incident postmortems, data-quality debugging, and agent-driven analytics fundamentally cannot know the queries up-front, so the ad-hoc class is structurally required for those workloads.
  7. Data doesn't move; data stays in-VPC. Compliance is a load- bearing argument: "Regulated data that cannot egress to an external SaaS provider can now be queried directly within your VPC, without procuring a separate query engine or moving data across providers, regions, or network zones. The data stays in your environment. Your data doesn't need to travel to be queryable." This is the BYOC data ownership invariant already canonicalised on the wiki (Source: sources/2025-04-03-redpanda-autonomy-is-the-future-of-infrastructure and concepts/byoc-data-ownership-for-iceberg) — extended one step further by colocating the query engine inside the same VPC, not just the storage and brokers. The data residency story now includes the analytical compute surface, removing the prior "send data to external query SaaS" compliance gap.
  8. Agent-driven query fan-out as the canonical workload. "A human analyst writes one query, reads the result, writes another. An AI agent fans out across dozens of tables and writes hundreds of queries simultaneously: comparing time windows, validating patterns, and exploring hypotheses in parallel. Agents need data that arrived seconds ago to make good decisions — not a pipeline snapshot from several minutes ago that may no longer reflect what's actually happening." Canonicalises concepts/agent-driven-query-fan-out — the structural reframing of analytical query workloads as the consumer transitions from human-paced serial questioning to agent-paced parallel hypothesis-exploration. The platform property required is freshness + scale + minimal infrastructure footprint together — three pieces that warehouse-side ingestion architectures cannot deliver simultaneously without over-provisioning.
  9. Three products, one platform: Streaming + Connect + SQL. The launch reframes the Redpanda Data Platform from a streaming vendor into a complete data-platform vendor: "Streaming, Connect, and SQL comprise the Redpanda Data Platform. […] One architecture. One operational model. One vendor. And one fewer conversation about which tool handles the analytics layer." Three independently-designed products integrated under one cluster: Streaming moves data, Connect wires sources/destinations (300+ connectors), SQL makes the data queryable. This is Redpanda's positioning answer to Confluent's Kora + Flink + Tableflow and to the Kafka + ETL + Snowflake assembly model.
  10. GA scope narrow, expansion roadmap explicit. GA is AWS BYOC on consumption-based (usage-based billing) plans only. "GCP BYOC and BYOVPC support are coming soon… Self-managed deployment is targeted for 2H FY27." Activation flow is three steps and no cluster restart"No new cluster. No broker restart." Existing BYOC-on-AWS customers on usage-based plans can activate from the cluster overview page without filing a ticket.

Architecture & numbers

Property Value / disclosure
GA date 2026-05-27
Preview/private-beta mid-December 2025 (per 2025-10-28 ADP announcement)
Acquisition Oxla acquired by Redpanda in 2025 (sources/2025-10-28-redpanda-introducing-the-agentic-data-plane)
GA deployment scope Redpanda BYOC on AWS, consumption-based (usage-based billing) plans
Roadmap GCP BYOC + BYOVPC: "coming soon" (no date); Self-Managed: 2H FY27
Engine substrate C++ MPP (Massively Parallel Processing); designed for analytical/OLAP workload throughput
Wire protocol PostgreSQL
Compatible clients (named) psql, DBeaver, DataGrip, Redpanda Console SQL Studio
Hot tier substrate Iceberg Topics live tier (broker log)
Cold tier substrate Parquet/Iceberg files in S3 or GCS (Iceberg Topics simultaneous-write)
Read-path optimisation "engine figures out an optimized read path across both" tiers (mechanism not disclosed)
Activation Three steps, no cluster restart; cluster overview page in Redpanda Cloud
Materialised views None at GA — "There are no materialized views to predefine" (positioning vs ksqlDB)
Tooling residence Engine managed entirely by Redpanda; "Oxla" name appears only in the activation flow

No quantitative latency, throughput, or scale numbers are disclosed in the post. No comparison benchmarks vs Snowflake / Databricks / Trino / BigQuery / Flink SQL on the same Iceberg substrate. No disclosure of degree-of-parallelism, supported SQL feature coverage, or transactionality semantics across the two tiers.

Systems extracted

  • NEW systems/redpanda-sql — the GA product itself; canonical wiki page as a system distinct from Oxla (the engine technology) and from Redpanda Streaming / Connect (the sibling products).
  • systems/oxla — the C++ MPP query engine substrate; wiki page already exists from the 2025-10-28 ADP / Governed autonomy posts; extended here with the Redpanda SQL GA face.
  • systems/redpanda — the streaming broker; SQL is now an in-cluster peer to Streaming.
  • systems/redpanda-byoc — the deployment-model wrapper Redpanda SQL GA initially ships inside; SQL extends BYOC's data-locality story to the analytical-compute surface.
  • systems/redpanda-iceberg-topics — the live + cold dual-tier substrate Redpanda SQL queries transparently across.
  • systems/redpanda-connect — peer product in the Redpanda Data Platform triad (sources/destinations).
  • systems/apache-iceberg — the historical-table format Redpanda SQL queries directly without ingestion.
  • systems/postgresql — the wire protocol Redpanda SQL implements; source of universal client compatibility.
  • systems/apache-flink — competitive foil (Flink SQL / Flink streaming jobs as the "hammer looking for nail" incumbent).
  • systems/apache-spark — competitive foil (Spark jobs for one-shot streaming-data investigation).
  • ksqlDB — competitive foil at the predefined-vs-ad-hoc axis (named in body, not yet a wiki page).
  • systems/snowflake, systems/databricks, systems/google-bigquery — competitive foils at the warehouse-with-ingestion-pipeline axis.

Concepts extracted

  • NEW concepts/in-cluster-streaming-sql — analytical SQL engine deployed inside the same cluster as the streaming broker and storage substrate; in-VPC compute locality.
  • NEW concepts/two-tier-stream-iceberg-query-bridge — single SQL statement reads transparently across the live broker tier and the cold Iceberg tier; engine plans an optimised read across both.
  • NEW concepts/postgres-wire-protocol-as-streaming-sql-surface — the architectural choice of speaking Postgres-wire from the streaming-platform's analytical surface, inheriting the entire Postgres-driver ecosystem at zero protocol-design cost.
  • NEW concepts/agent-driven-query-fan-out — the workload shape where consumers fan out hundreds of parallel queries instead of writing one-then-reading-then-writing-the-next; structural motivation for ad-hoc + low-latency + high-fanout query substrates.
  • NEW concepts/ad-hoc-vs-predefined-streaming-sql — the structural axis between predefined-query streaming SQL (ksqlDB, Flink SQL) and ad-hoc streaming SQL (Redpanda SQL); both classes have legitimate use, and incident-postmortem / data-debugging / agent-fan-out workloads cannot know the queries up front.
  • concepts/zero-etl-operational-analytical — extended with the broker-substrate face. Prior canonical instances were Moonlink-on-Lakebase + Aurora-zero-ETL→Redshift; Redpanda SQL is the first canonical instance where the operational substrate is a streaming broker rather than a transactional database.
  • concepts/iceberg-topic — extended with the SQL-readable face; the Iceberg topic's "streaming data in both a live tier and a Parquet/Iceberg cold tier in S3 or GCS simultaneously" is what makes the transparent-two-tier-query feasible.
  • concepts/byoc-data-ownership-for-iceberg — extended; data-ownership invariant now includes the analytical-compute surface, not only storage and brokers.
  • concepts/compute-storage-separation — already canonical; Oxla is a compute-storage-separated MPP engine.

Patterns extracted

  • NEW patterns/in-vpc-query-engine-on-streaming-substrate — deploy the analytical query engine inside the customer's VPC, colocated with the streaming broker and Iceberg storage, so queries access data in-place without egressing the VPC.
  • NEW patterns/transparent-hot-cold-tier-query — single SQL statement reads across both tiers of a stream + Iceberg substrate via an engine that plans a unified read path; consumer doesn't care which tier holds which records.

Caveats & open questions

  • Tier-3 marketing-roundup framing. GA-launch post primarily positions against Flink/Spark/ksqlDB and the warehouse-ingest model; mechanism depth is intentionally light. Architectural primitives are real, but execution-layer details (planner shape, unified-tier read path mechanism, query-result-cache, parallelism-degree autoscaling, predicate-pushdown into Iceberg metadata) are not disclosed.
  • No quantitative numbers. No latency p-values, throughput benchmarks, scale tests, or comparison numbers vs Snowflake / Databricks / Trino / BigQuery / Flink SQL on equivalent Iceberg substrates. Compute-cost vs warehouse comparison unaddressed.
  • GA scope narrow. AWS BYOC consumption-plan only; GCP BYOC, BYOVPC, and Self-Managed are roadmap items. Customers on committed-volume plans aren't covered.
  • Read path across two tiers not detailed. The post repeats "the engine figures out an optimized read path across both" without naming the mechanism. Open: does the engine route per partition based on snapshot freshness? Per offset range? Per timestamp? How are in-flight unsynced records consistent with the Iceberg snapshot?
  • No transactionality semantics across tiers disclosed. What isolation level does a query see when records are mid-write to the live tier but not yet committed to Iceberg? Read-committed? Snapshot? Eventually-consistent?
  • Materialised views explicitly absent at GA. The post positions this as a feature (no predefined-query lock-in) rather than a gap. Whether MVs are roadmap or rejected is unclear.
  • Streaming-update / DML semantics unspecified. The post says "You write SQL" but doesn't disclose whether INSERT / UPDATE / DELETE work, whether DDL is supported (CREATE TABLE, schema evolution), or whether queries are read-only.
  • SQL feature coverage unspecified. Postgres wire protocol ≠ Postgres SQL feature coverage. Window functions, CTEs, JOIN types, time-window operators, JSON operators, vector / full- text-search extensions, query-language extensions for streaming-specific operators (TUMBLE, HOP, SESSION) are unaddressed.
  • Multi-cluster federation not addressed. Redpanda SQL queries inside one BYOC cluster. Whether queries can span topics across Redpanda clusters or non-Redpanda Iceberg catalogs (the federation framing in the 2025-10-28 Governed autonomy post — "federated queries spanning Apache Iceberg, Apache Kafka topics, and a broad suite of legacy data sources") at GA is not clarified.
  • Compute-isolation model unspecified. SQL runs "on the same infrastructure as your brokers" — is the analytical compute resource-isolated from the streaming compute? Is there a separate node pool? How are noisy-neighbor and back-pressure resolved?
  • Cost model. Consumption-based plan; no $/GB-scanned or $/query disclosed.

Source

Last updated · 542 distilled / 1,571 read