PATTERN Cited by 1 source
In-VPC query engine on streaming substrate¶
Deploy the analytical query engine inside the customer's VPC, colocated with the streaming broker and Iceberg storage, so SQL queries access data in-place without egressing the VPC or routing through a third-party SaaS query service. The data doesn't move; the analytical compute moves to the data. First wiki canonicalisation 2026-05-27 via Redpanda SQL.
Pattern shape¶
Customer cloud account / VPC
┌─────────────────────────────────────────────────────────────┐
│ │
│ ┌─────────────────────┐ ┌────────────────────────┐ │
│ │ Streaming broker │◀──▶│ In-cluster query │ │
│ │ (Redpanda) │ │ engine (Oxla MPP) │ │
│ │ │ │ - Postgres wire │ │
│ │ Live tier: │ │ - reads in-place │ │
│ │ log segments │ └────────────────────────┘ │
│ └─────────────────────┘ ▲ │
│ │ │ │
│ ▼ │ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Cold tier: Parquet/Iceberg in S3 / GCS │ │
│ │ (within customer's cloud account) │ │
│ └─────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────┘
▲
│ Postgres wire over private link / IAM-authenticated
│
┌────────┴─────────┐
│ SQL clients │
│ (psql, DBeaver, │
│ apps, agents) │
└──────────────────┘
* No data leaves the customer's VPC for query execution.
* Vendor's control plane (separate VPC) manages the engine but
doesn't see data.
Components¶
The pattern composes four primitives:
- A BYOC deployment model for the streaming broker (e.g. Redpanda BYOC) — the data plane already runs inside the customer's VPC.
- A storage substrate also in the customer's VPC — broker log segments + object storage (S3 / GCS / ADLS) + an Iceberg catalog (REST or file-based) all in the customer's account.
- A query engine that deploys onto the same BYOC infrastructure — colocated with the broker, sharing the same VPC. The query engine doesn't need its own separate cluster; it shares the BYOC cluster's compute envelope.
- A wire protocol clients can already speak — typically Postgres wire (concepts/postgres-wire-protocol-as-streaming-sql-surface) so the existing client ecosystem connects without engine- specific drivers.
Verbatim disclosure (Redpanda 2026-05-27)¶
"Redpanda SQL runs inside your existing BYOC cluster, on the same infrastructure as your streaming data brokers and Iceberg storage."
"Redpanda SQL runs on the same infrastructure as your brokers, inside your VPC, and every query accesses data in-place, in both the hot (stream) and cold (Iceberg table) tiers. Nothing is sent to a third-party compute service, which is critical if you have compliance requirements (and just as important for strong cybersecurity hygiene), working within your existing infosec-approved environment."
"Regulated data that cannot egress to an external SaaS provider can now be queried directly within your VPC, without procuring a separate query engine or moving data across providers, regions, or network zones. The data stays in your environment. Your data doesn't need to travel to be queryable."
(Source: sources/2026-05-27-redpanda-redpanda-sql-is-ga-the-query-engine-that-skips-the-pipeline)
Why the pattern matters¶
Three structural payoffs:
- Closes the analytical-compute gap in BYOC compliance stories. Pre-this-pattern: BYOC keeps the storage in the customer's VPC, but analytical queries require shipping data to an external SaaS warehouse. The data residency / no-egress story has a gap. Post- this-pattern: the analytical-compute substrate also runs in the customer's VPC, closing the gap.
- Eliminates per-query data egress costs. Cross-account / cross-region / cross-cloud egress is metered at $/GB and at scale becomes a structural cost driver. In-VPC query elimination of egress is a meaningful TCO axis for high-volume workloads.
- Reduces the cybersecurity surface area. Data that doesn't leave the customer's network is data that doesn't traverse a third-party network. The 2026-05-27 launch frames this as "strong cybersecurity hygiene" alongside compliance.
Trade-offs¶
| Axis | In-VPC query engine | External warehouse / SaaS query |
|---|---|---|
| Data residency / compliance | Data never egresses VPC | Data egress required for query |
| Egress cost | None (intra-VPC) | Cross-cloud / cross-region $/GB |
| Independent compute autoscaling | Bounded by BYOC cluster shape | Independently autoscaling SaaS |
| Operator footprint | One cluster | Two: streaming + warehouse |
| Vendor lock-in | Single vendor for both substrates | Best-of-breed per substrate |
| Security blast radius | Smaller — query bugs can't leak data out of VPC | Larger — query path crosses vendor's network |
The pattern is a strong fit when (a) data residency / compliance / egress cost matters, (b) the analytical workload is compatible with the cluster's shared-compute envelope, and (c) the streaming + SQL substrate combination dominates the analytical workload (i.e. the warehouse-best-of-breed compute isn't required for this query class).
Implementation considerations¶
- Resource isolation between query and streaming workloads. The query engine and streaming broker share infrastructure; back- pressure / noisy-neighbor / per-tenant resource shaping must be handled. Implementation depth varies; the 2026-05-27 Redpanda GA disclosure doesn't detail this.
- Engine deployment automation. The pattern requires the vendor's BYOC control plane to deploy / upgrade / monitor the query engine across many customer VPCs. Redpanda SQL ships as "three steps and no cluster restart" activation in the Redpanda Console UI.
- Authentication / authorization. Postgres wire supports SCRAM / mTLS / IAM-bridged auth. The pattern composes with Redpanda's on-behalf-of (OBO) agent authorization framing for governed agent fan-out.
- Cross-VPC query federation. The pattern as canonicalised in Redpanda SQL GA is single-cluster scope; whether the engine can federate across multiple BYOC clusters or non-Redpanda Iceberg catalogs depends on the engine's federated-query capability (Oxla disclosed federated-query support in 2025-10-28 ADP framing; GA scope unclear).
Canonical wiki instance¶
- systems/redpanda-sql + systems/redpanda-byoc (2026-05-27 GA) — the canonical wiki instance: Postgres-wire MPP query engine (Oxla) deployed inside the Redpanda BYOC cluster, querying live topics + Iceberg cold tier in place.
Adjacent patterns¶
- Storage-only BYOC (the predecessor pattern). Brokers + storage in the customer's VPC, but query / analytics handled externally. This pattern extends storage-only BYOC by adding the analytical- compute surface to the in-VPC scope.
- patterns/transparent-hot-cold-tier-query — the read-side pattern that this in-VPC deployment pattern operationally composes with. The two patterns address different axes: this one is about where the engine lives; the other is about what it reads.
- External warehouse + zero-ETL ingestion (e.g. Aurora → Redshift zero-ETL, Lakehouse Sync) — the alternative shape that preserves the BYOC storage residency but requires external analytical compute. Different security / cost / governance trade-offs.
Seen in¶
- 2026-05-27 — sources/2026-05-27-redpanda-redpanda-sql-is-ga-the-query-engine-that-skips-the-pipeline — first wiki canonicalisation; Redpanda SQL GA materialises the pattern over the BYOC + Iceberg-Topics substrate.
- 2025-10-28 — sources/2025-10-28-redpanda-introducing-the-agentic-data-plane — pre-disclosure of the architectural shape via the Oxla acquisition framing (
rpk oxlaruns in-cluster; ADP positioning).
Caveats¶
- GA scope narrow. Redpanda SQL GA is AWS BYOC consumption-plan only; GCP BYOC / BYOVPC / Self-Managed are roadmap.
- Resource isolation model not disclosed. Whether SQL workloads and streaming workloads run on isolated node pools, share a scheduler with QoS classes, or compete unbounded for resources is not detailed in the 2026-05-27 launch.
- Multi-cluster federation scope unclear. The pattern is canonicalised at single-cluster scope; cross-cluster / cross- catalog federation is implied by Oxla's design but GA-scope federation is not detailed.
- One-vendor lock-in axis. The compactness benefit (one cluster, one vendor) trades against the flexibility of best-of- breed components.