SYSTEM Cited by 24 sources
Redpanda¶
Redpanda is a ground-up C++ rewrite of a Kafka-API-compatible
streaming broker, built on the thread-per-core
Seastar framework with Raft-based replication.
Because Redpanda implements Kafka's wire protocol, every Kafka
client (Java KafkaProducer, librdkafka, kafka-python, etc.)
interacts with Redpanda identically to Apache Kafka — including
the producer-side batching, partitioning, and acknowledgment
semantics.
Canonical wiki entry points:
- Kozlovski's 2024-05-09 Kafka 101 names Redpanda in the industry-trajectory survey as one of three alternative Kafka-API implementations.
- James Kinley's 2024-11-19 Batch tuning in Redpanda part 1 is Redpanda's first-principles explainer on producer-side batching — the canonical source for the wiki's seven-factor effective-batch-size framework.
Architecture (stub — expand)¶
- Thread-per-core runtime. Built on Seastar; each CPU core owns a shard of partitions and runs a single thread with cooperative task scheduling. No shared state across cores.
- Raft replication. Per-partition Raft groups replace Kafka's ISR-based replication model. Leader election is bounded by Raft election timeouts rather than ZooKeeper/KRaft control plane.
- Kafka wire protocol. Full client compatibility. Producer
semantics —
linger.ms,batch.size,buffer.memory,max.in.flight.requests.per.connection, sticky partitioner,acks=0/1/all— match Kafka client-side behaviour identically. - Tiered storage. Offload historical segments to object stores (S3, GCS). Stub — deferred to future source ingests.
- Iceberg topics (2024; GA 25.1, 2025-04-07, multi-cloud on AWS/Azure/GCP). Topic-level integration with Apache Iceberg — a single logical entity is both a Kafka-protocol topic and an Iceberg table. See systems/redpanda-iceberg-topics for the substantive entry; concepts/iceberg-topic for the concept.
Redpanda 25.3 (2025-11, preview)¶
The 25.3 release preview post (2025-11-06) introduces four headline features across three architectural axes:
- Shadowing — a broker-internal, byte-for-byte, offset-preserving hot-standby clone of a source cluster in a different region. Replaces MirrorMaker2 and the prior Redpanda Migrator for Redpanda-to-Redpanda cross-region DR. Verbatim: "Shadowing is built into the Redpanda broker itself and uses the standard Kafka API to link clusters. No MirrorMaker 2 or Redpanda Migrator connectors are used under the hood." RPO/RTO in seconds. Canonicalises offset-preserving replication and broker-internal cross-cluster replication on the wiki; the composed pattern is patterns/offset-preserving-async-cross-region-replication. The 2026-04-21 Shadow Linking deep-dive walks the mechanism at per-broker-task altitude (canonicalising parallel broker replication tasks as the implementation mechanism), scale-validates at 2.5 GiB/s / 2.5 M msg/s / <10k msg lag / ~4 ms RPO, and introduces two refinements — per-topic failover granularity (patterns/topic-level-granular-dr-failover) and reciprocal active-passive (patterns/reciprocal-active-passive-via-parallel-shadow-links via parallel shadow links on both clusters).
- Cloud Topics (Beta) — a per-topic storage tier where record bytes land directly on object storage (S3 / ADLS / GCS) while topic metadata stays in-broker (Raft-replicated). "Virtually eliminates the cross-AZ network traffic associated with data replication" — canonicalises concepts/cross-az-replication-bandwidth-cost and concepts/latency-critical-vs-latency-tolerant-workload. Positioned against Confluent's "Kora-powered … standard/ dedicated … Freight … plus separate Confluent WarpStream engine (BYOC)" multi-cluster shape. Canonical pattern: patterns/per-topic-storage-tier-within-one-cluster.
- Iceberg Topics + Google BigLake — Redpanda's Iceberg Topics can now register streaming tables to Google BigLake metastore (with Dataplex governance), making GCP the fourth managed REST catalog integration alongside Unity Catalog, Snowflake Open Catalog (Polaris), and AWS Glue.
- Microsoft SQL Server CDC input for Redpanda Connect — the fifth per-engine CDC input in the Redpanda Connect family (after Postgres / MySQL / MongoDB / Spanner). Rides on MSSQL's native change tables. Available in Redpanda Connect 4.67.5 (enterprise). Vendor benchmark: ~40 MB/s ingest + 3:15 initial snapshot on a 5M-row table vs ~14.5 MB/s and 8:04 for an unnamed alternative.
(Source: sources/2025-11-06-redpanda-253-delivers-near-instant-disaster-recovery-and-more)
Iceberg topics (lakehouse-native Bronze sink)¶
Redpanda's Iceberg topics let a topic double as an Iceberg table without any external ETL job: producers write records via the normal Kafka producer API; the broker transparently projects records into columnar Parquet on object storage and updates an external Iceberg REST catalog (Databricks Unity, Snowflake Polaris). Downstream Iceberg-aware engines — ClickHouse, Snowflake, Databricks, Trino, Spark, Flink — query the tables directly.
Architecturally, this positions Redpanda as the Bronze tier of a Medallion Architecture lakehouse without an intermediate integration cluster (Kafka Connect / Redpanda Connect / custom Airflow jobs). Canonical wiki pattern: patterns/streaming-broker-as-lakehouse-bronze-sink.
(Source: sources/2025-01-21-redpanda-implementing-the-medallion-architecture-with-redpanda)
Iceberg Topics GA (25.1, 2025-04-07)¶
Iceberg Topics were promoted from preview to General Availability in Redpanda's 25.1 release, with simultaneous availability on AWS, Azure, and GCP. The GA post (sources/2025-04-07-redpanda-251-iceberg-topics-now-generally-available) discloses nine named properties that distinguish the GA surface from the pre-GA preview: four table-management capabilities (custom hierarchical bucketed partitioning, built-in dead-letter queues, full Iceberg-spec-compliant schema evolution, automatic snapshot expiry) and five catalog-integration capabilities (secure REST catalog sync via OIDC+TLS, transactional writes, automatic table discovery and registration, built-in object- store catalog fallback, tunable workload management for the snapshot-to-topic lag ceiling). The GA disclosure retires two prior wiki caveats (snapshot-expiry ownership is broker-owned; Iceberg-spec schema evolution is fully supported); small-file compaction ownership remains an open question.
The 25.1 release bundles several additional features adjacent to the streaming substrate: native consumer group lag metrics (Prometheus-exposed; replaces a previously documented PromQL compute), Protobuf schema normalization in the Schema Registry, SASL/PLAIN authentication, unified Console+cluster identity with fine-grained RBAC, and platform- centric versioning for Kubernetes deployments with FluxCD removal to reduce conflicts with customer FluxCD installations.
Kafka-API-compatible batching semantics¶
From the 2024-11-19 batch-tuning explainer:
"Just as in Apache Kafka, a batch in Redpanda is a group of one or more messages written to the same partition, which are bundled together and sent to a broker in a single request. Rather than each message being sent and acknowledged separately, requiring multiple calls to Redpanda, the client buffers messages for a short time, optionally compresses the whole batch, and then sends them later as a single request." (Source: sources/2024-11-19-redpanda-batch-tuning-in-redpanda-for-optimized-performance-part-1)
The three producer knobs — linger.ms, batch.size,
buffer.memory — compose identically on Redpanda and Kafka. The
seven-factor effective-batch-size framework (message rate,
batch.size, linger.ms, partitioning, producer fan-out, client
buffer memory, backpressure) applies identically to both systems.
The CPU-saturation latency inversion (concepts/batching-latency-tradeoff)
— where increasing linger.ms reduces tail latency under broker
saturation by shrinking the internal work-queue backlog — is
canonicalised from the Redpanda explainer but is equally true of
Kafka brokers.
High availability: multi-region stretch clusters¶
From the 2025-02-11 stretch-clusters post, the canonical wiki statement of Redpanda's region-spanning HA/DR shape:
"A multi-region Redpanda cluster is a deployment topology that allows customers to run a single Redpanda cluster across multiple data centers or multiple cloud regions. It's often referred to as a stretch cluster, where a single cluster stretches across multiple geographic regions with data distributed across all deployment regions. Data is replicated synchronously via raft protocol between brokers distributed across multiple regions."
(Source: sources/2025-02-11-redpanda-high-availability-deployment-multi-region-stretch-clusters)
Redpanda's stretch-cluster shape — single control plane, one per-partition Raft group spanning regions — achieves RPO=0 on region loss via automatic Raft re-election in surviving regions. The canonical wiki concept is concepts/multi-region-stretch-cluster; the pattern is patterns/multi-region-raft-quorum; the alternative lower-cost shape is MirrorMaker2 between two independent clusters (non-zero RPO).
Four operator knobs mitigate cross-region cost — all canonicalised from the same post:
- Leader pinning (enterprise feature) — pin partition leaders to client-proximal regions (patterns/client-proximal-leader-pinning).
acks=1— producer-side durability relaxation; leader-only ack. Composes with concepts/acks-producer-durability.- Follower fetching — consumers read from the closest replica rather than the leader; Kafka-API KIP-392 (patterns/closest-replica-consume).
- Remote read replica topic — object-storage-backed read-only mirror on a separate cluster; zero load on origin brokers.
Deployment uses region-as-rack via Ansible, reusing the same
rack-awareness machinery as multi-AZ. Performance testing
substrate: OMB +
tc
inter-broker latency injection. Kubernetes-operator gap as of
the post: multi-region stretch is not supported on K8s (only
VMs / bare metal / cloud compute / Redpanda Cloud).
Agent infrastructure (2025-04)¶
As of the 2025-04-03 founder-voice autonomy post, Redpanda positions the broker as the durable-log substrate for enterprise AI agents — agent-to-agent communication, human-in-the-loop workflows, trace capture, evaluation replay, message sampling, collaborative threads, time-travel debugging all backed by the distributed log. The canonical wiki statement of this positioning is "the truth is the log" — Alex Gallego's citation of Kleppmann's 2015 "database inside out" framing as Redpanda's founding premise.
Three product-surface components ship alongside the broker:
- systems/redpanda-connect — the ~300-connector integration
layer, exposed as
MCP servers
via
rpk connect mcp-server. - systems/redpanda-agents-sdk — the Python SDK +
rpk connect agentCLI that glues durable-execution, MCP tool discovery, and Redpanda Connect pipelines into a Rails-style developer experience. - systems/redpanda-byoc — the Bring Your Own Cloud deployment model whose Data Plane Atomicity invariant keeps the entire agent pipeline (broker + connectors + MCP + small-model inference) inside the customer's firewall. This is the structural substrate for sending models to the data.
(Source: sources/2025-04-03-redpanda-autonomy-is-the-future-of-infrastructure)
Agentic Data Plane (2025-10-28 productization)¶
Seven months after the autonomy-essay founder-voice framing, Gallego's 2025-10-28 Introducing the Agentic Data Plane names the commercial packaging of enterprise autonomy: Agentic Data Plane (ADP) — "a unified runtime and control plane that safely exposes enterprise data to AI agents". Four layers composed over the existing Redpanda streaming substrate:
- (A) Streaming — the Redpanda broker itself, substrate for durable execution, HITL async mailboxes, durable model replay, and observability event capture.
- (B) Query engine — Oxla, a newly-acquired C++ distributed query engine with PostgreSQL wire protocol, separated compute- storage, and Iceberg-native workload targeting. "SQL is the best mechanism to filter and aggregate while the model summarizes." Early preview mid- December 2025; rolling integration into the product.
- (C) Connectors — the existing 300+ Redpanda Connect catalog rebadged as ADP's integration layer.
- (D) Governance — net-new global policy + observability layer enforcing governed agent data access. Concrete substrate: "OBO to task-based authentication, DLP hooks, per-agent consent workflows, and immutable audit trails with configurable retention." The first shipped feature is "Remote MCP + authentication + authorization for OBO (on-behalf-of) workloads with IdP integration" — canonical wiki instance of OBO agent authorization.
Product roadmap announced in the same post:
- Agent templates for common enterprise data sources (Git for code repos, Jira, GDrive).
- Declarative Agent Runtime as opinionated layer above Redpanda Agents SDK.
- Oxla acquisition — integrated operationally via
rpk oxlaCLI.
Gallego's governance-first framing inverts typical agent-product marketing verbatim: "The fear from CIOs is not the code of the agent itself, it is governance. In simple terms, it is access controls: can I trust that data is accessed by the right things? And observability: when things go wrong, can I understand what happened?" — canonicalised on the wiki as concepts/governed-agent-data-access.
Redpanda SQL (Oxla productisation, GA 2026-05-27)¶
The third pillar of the Redpanda Data Platform — "Streaming, Connect, and SQL" — reaches GA on 2026-05-27 (Source: 2026-05-27 Redpanda SQL is GA). Redpanda SQL is the productised GA face of the Oxla MPP query engine acquired 2025-10-28; the acquisition → mid-December 2025 preview → 2026-05-27 GA arc is complete.
GA scope: Redpanda BYOC on AWS, consumption-based plans only. GCP BYOC + BYOVPC: "coming soon". Self-Managed: 2H FY27. Activation is three steps with no cluster restart from the Redpanda Console cluster overview page.
Four GA properties (full canonicalisation on systems/redpanda-sql):
- In-cluster, in-VPC — concepts/in-cluster-streaming-sql / patterns/in-vpc-query-engine-on-streaming-substrate. Redpanda SQL runs on the same BYOC infrastructure as the brokers and Iceberg storage, inside the customer's VPC; "every query accesses data in-place". Closes the analytical-compute gap in BYOC compliance stories: pre-Redpanda-SQL, BYOC kept storage in-VPC but analytical queries required egress to a third-party warehouse. Redpanda SQL closes that gap.
- Postgres wire protocol —
concepts/postgres-wire-protocol-as-streaming-sql-surface.
"It's just Postgres." Connect with
psql, DBeaver, DataGrip, or Redpanda Console SQL Studio. The same architectural move Redpanda made with Kafka wire protocol on the broker side, applied to the SQL surface. - Transparent two-tier query bridge — concepts/two-tier-stream-iceberg-query-bridge / patterns/transparent-hot-cold-tier-query. A single SQL statement reads transparently across the live broker tier and the Iceberg Topics cold-tier Parquet files; the engine plans the unified read path. Substrate-dependent on Iceberg Topics' simultaneous-write property.
- MPP execution from Oxla — "Massively Parallel Processing" C++ engine; same implementation language as the streaming broker; designed for OLAP throughput with extreme memory efficiency.
Five workload classes named at launch (full enumeration on systems/redpanda-sql): streaming-app debugging, real-time operational analytics, ad-hoc analytics, compliance queries, agent-driven query fan-out (concepts/agent-driven-query-fan-out — humans serial, agents parallel; "hundreds of queries simultaneously").
Explicit foil against ksqlDB at the ad-hoc vs predefined axis: "ksqlDB is a handy tool, but it requires you to decide what questions you're going to ask before the events arrive."
The launch reframes the Redpanda Data Platform from a streaming vendor into a complete data-platform vendor: "One architecture. One operational model. One vendor." — the positioning answer to Confluent's Kora + Flink + Tableflow and to Kafka + ETL + Snowflake.
Performance tuning checklist (2025-04)¶
The 2025-04-23 "Need for speed: 9 tips to supercharge Redpanda" post (Source: sources/2025-04-23-redpanda-need-for-speed-9-tips-to-supercharge-redpanda) provides an omnibus performance-tuning checklist for Redpanda clusters, organised across three dependency layers:
Infrastructure (tips 1–2) — deploy on local
NVMe; run brokers on dedicated hardware with
no noisy neighbors; give Redpanda
95% of available resources (leave 5% for OS / k8s host).
When NVMe isn't available (SSD, spinning disks, SAN, remote
storage), enable broker-side
write caching — always paired with acks=all to preserve the
quorum-memory durability guarantee. Canonical framing of
write-caching as the hardware-shortfall mitigation
(complementing the earlier Kinley 2024-11-26 organisational
framing).
Data architecture (tips 3, 8, 9) — partition skew kills parallelisation (Amdahl's Law): use the sticky partitioner for unkeyed records; only use keyed partitioning when required (CDC); pick high-cardinality keys when keys are unavoidable (patterns/high-cardinality-partition-key). Don't compress compacted topics unless you accept the decompress/recompress CPU tax (concepts/compression-compaction-cpu-cost). Use tiered storage not just for capacity but for orders-of-magnitude faster decommission and recommission — data already in object storage doesn't need to re-replicate.
Application design (tips 4–7) — tune
producer batching via
linger.ms + batch.size; tune
consumer fetches with a
four-parameter matrix (fetch.min.bytes, fetch.max.wait.ms,
max.partition.fetch.bytes, max.poll.records) pivoted on
low-latency vs high-throughput regime; control
offset-commit cost — each commit
is a write to __consumer_offsets, so
auto.commit.interval.ms ≥ 1 s (low-ms is "right out") and
one consumer group per service. Compress on the client, not
the broker (patterns/client-side-compression-over-broker-compression);
prefer ZSTD or LZ4 as the
codec-CPU-vs-ratio sweet
spot.
Kubernetes deployment¶
Redpanda supports two deployment paths on Kubernetes: the Helm chart (simple, template- driven, limited lifecycle automation) or the production-grade Redpanda Operator (CRD-driven, managed upgrades + dynamic configuration + lifecycle automation + multi-tenancy). The operator is the default recommendation.
The operator has consolidated across 2025. Prior state: two separate operators — an internal one for Redpanda Cloud + BYOC, and a customer-facing one for Self-Managed. The customer operator initially bundled FluxCD internally to wrap the Helm chart — canonical wiki instance of the bundled-GitOps- dependency anti-pattern. The 2025 consolidation retired that structure across three branches:
- v2.3.x — FluxCD optional (
spec.chartRef.useFlux). - v2.4.x (Jan 2025) — FluxCD disabled by default.
- v25.1.x — FluxCD + Helm-chart wrapping removed; unified operator serving both Cloud and Self-Managed. Adopts the version-aligned compatibility scheme — operator/chart version matches Redpanda core version with ±1 minor window, retiring the compatibility matrix.
Canonical pattern: patterns/unified-operator-for-cloud-and-self-managed.
Deployment-shape limitation: per the 2025-02-11 stretch-cluster post, "Self-Managed on K8s currently supports only multi-AZ deployments" — multi-region stretch is VMs / bare metal / cloud compute / Redpanda Cloud only.
(Source: sources/2025-05-06-redpanda-a-guide-to-redpanda-on-kubernetes)
FIPS compliance (broker-level, 2025-05-20)¶
As of Redpanda's 2025-05-20 Implementing FIPS compliance in Redpanda post, Redpanda brokers can operate under a FIPS cryptographic boundary for deployments into US federal / regulated environments.
- Substrate: OpenSSL 3.0.9 —
FIPS 140-2
validated; 140-3 validation under NIST review at post
publication. Late-2025 upgrade target: OpenSSL 3.1.2 (FIPS
140-3 validated) ahead of 140-2 sunset. Both
redpandabroker binary andrpkCLI consume the validated module. - Artefact distribution: two packages install alongside base
Redpanda —
redpanda-fips(OpenSSL FIPS module) andredpanda-rpk-fips(FIPS-compliantrpk). RPM + Debian at post publication. - Config dial: three-state
fips_modeinredpanda.yaml:disabled/enabled/permissive, plusopenssl_config_file+openssl_module_directorypaths.enabledis the production setting;permissiveis a dev-ergonomics affordance allowing broker-level FIPS logic without requiring OS-level FIPS (canonical warning: "anything crypto-related that relies on the operating system (such as sourcing entropy) may not be in full compliance"). - Enforcement: broker startup fail-fast — "Redpanda will log an error and exit if the underlying operating system isn't properly configured." No silent downgrade; the cluster either passes the boundary at startup or hard-fails. Structurally stronger than the logging-then- enforcement progressive-rollout shape by design — regulated workloads have no warn-only regime.
- OS precondition (RHEL 8+):
fips-mode-setup --enable→ reboot →fips-mode-setup --checkreports "FIPS mode is enabled". Only then doesfips_mode: enabledin Redpanda succeed. - Deployment automation: Redpanda Ansible
Collection
accepts
-e "enable_fips=true" -e "fips_mode=enabled"on theprovision-cluster.ymlplaybook to pull FIPS binaries and write the FIPSredpanda.yaml. - Boundary scope at publication: self-managed RPM / Debian only. Redpanda Cloud, Kubernetes deployments, and systems/redpanda-connect are on the roadmap — a canonical wiki instance of the FIPS boundary being narrower than a product's full deployment surface because validated-module distribution is deployment-shape-specific.
License-gated enterprise feature.
(Source: sources/2025-05-20-redpanda-implementing-fips-compliance-in-redpanda)
GCP outage response (Redpanda Cloud, 2025-06-20 retrospective)¶
Redpanda's 2025-06-20 retrospective on the 2025-06-12 GCP global outage (Source: sources/2025-06-20-redpanda-behind-the-scenes-redpanda-clouds-response-to-the-gcp-outage) discloses Redpanda Cloud's substrate posture during a cascading cloud-provider event. The post's load-bearing disclosures:
- Cell-based architecture as an explicit product principle. "Redpanda Cloud clusters do not externalize their metadata or any other critical services. All the services needed to write and read data, manage topics, ACLs, and other Kafka entities are co-located, with Redpanda core leading the way with its single-binary architecture." Explicit contrast: "other products boasting centralized metadata and a diskless architecture likely experienced the full weight of this global outage."
- Availability SLA structural composition. The 99.99% SLA
(with ≥99.999% design target) decomposes to six concrete
substrate choices:
- Replication factor ≥ 3 enforced on all topics (customers cannot lower, only increase).
- Local NVMe primary storage + async tiered storage as fallback, not primary. Object-store errors don't block writes.
- Redundant Kafka API + Schema Registry + Kafka HTTP Proxy.
- No critical-path external dependencies beyond VPC + compute nodes + locally-attached disks (with PSC-enabled deployments as the named exception).
- Continuous chaos + load testing of each cluster tier.
- Release-engineering discipline with feedback- control-loop-guarded phased rollouts — "we try to close our feedback control loops by watching Redpanda metrics as the phased rollout progresses and stopping when user-facing issues are detected."
- Private Service Connect (PSC) is the named dependency exception. When PSC is enabled, it becomes part of the critical path for read / write. Canonicalises the one deployment shape where the "no external dependencies" claim does not strictly hold.
- Deliberate disk reserve — unused + used-but-reclaimable NVMe space kept available for reclamation during tiered-storage stress.
- Hedged observability — self-hosted data, third-party for dashboarding and alerting; the 2024 migration paid off during the 2025-06-12 cascading outage where the third-party was partially affected but self-hosted substrate stayed queryable.
- Single node lost during the outage — staging cluster in
us-central-1. "An uncommon interaction between internal infrastructure components" produced a node failure with no replacement until GCP recovered ~2 hours later. One cluster out of hundreds. - Customer stack context changes the urgency calculus. "For some of them, GCP's Pub/Sub served as the data source for their Redpanda BYOC clusters, so they needed to recover that first." Redpanda's position downstream of GCP-native sources meant upstream outages limited even a counterfactually-affected Redpanda cluster's customer urgency.
Profile-guided optimization (26.1, 2026-04-02)¶
Redpanda Streaming 26.1 enabled clang PGO for the broker binary, delivering ~10-15% overall efficiency improvement on small-batch CPU-intensive workloads — announced as a one-line feature in the 2026-03-31 26.1 launch post, then unpacked mechanism-by-mechanism in the 2026-04-02 engineering deep-dive (Source: sources/2026-04-02-redpanda-supercharging-streaming-with-profile-guided-optimization).
Measured wins on the canonical small-batch regression benchmark:
- ~50% reduction in p50 latency.
- Up to 47% reduction in p999 latency.
- 15% reduction in CPU reactor utilization.
The amplification asymmetry (15% CPU → 47% p999) is the canonical batching-under-saturation shape — less CPU per request → shorter broker queue depth → disproportionately lower tail latency.
Diagnostic methodology: Redpanda used
top-down
microarchitecture analysis via Linux perf stat --topdown to
identify the workload as 51% frontend-bound on baseline —
"definitely on the higher end, even for database or distributed
applications." PGO reduced frontend-bound to 37.9%, with 6
percentage points shifting to retiring (useful work) and 7 to
backend-bound (revealed next bottleneck). Canonical example of
TMA-guided
optimisation target selection.
PGO vs BOLT evaluation: Redpanda evaluated both PGO and LLVM BOLT and chose PGO citing stability:
"PGO is a proven and widely deployed technology, so with this in mind and considering some outstanding BOLT bugs, we decided to stick with PGO."
BOLT performance was "similar to PGO. Most of the time, it came
in just slightly behind." Redpanda hit LLVM bug
llvm-project#169899
during their BOLT evaluation — the first wiki-canonical non-Meta
BOLT brittleness datum (contrast with Meta's fleet-scale success
via BOLT + Strobelight).
Combining both gave "another small bump in performance"; the
post preserves the option of "adding BOLT on top of PGO at some
point."
Mechanisms applied: PGO enables hot-cold code splitting, basic-block reordering, and profile-driven inlining — all targeting instruction-cache locality. BOLT's heatmap visualisation tool confirmed the PGO-optimised binary packs hot functions tightly at the binary's start; the baseline distributed them across the binary.
See the canonical apply pattern at patterns/pgo-for-frontend-bound-application and the per-binary (non-fleet-scale) variant of patterns/feedback-directed-optimization-fleet-pipeline as the substrate framing.
Seen in¶
-
sources/2026-06-02-redpanda-how-omninode-uses-redpanda-to-scale-ai-agent-workflows — 2026-06-02 Redpanda Blog guest post by Jonah Gray, founder/ CEO of OmniNode. Tier-3 source, included on architecture-content grounds (substantive disclosure on contract-driven topic naming + extractor topology, with a candid scope-boundary on configuration drift). Canonicalises Redpanda's "single-binary, fits in 8 GB development profile, same compose file everywhere" affordability as the load- bearing property that lets the broker exist identically in local dev, CI, dev containers, and the homelab runtime — "if the broker is operationally heavy, teams eventually stop running it locally. They fake the bus, mock topic creation, or maintain a second development path that doesn't actually validate topic identity." This is the wiki's first canonical disclosure of broker affordability as a discipline-enabling property — the architectural argument that operational lightness is not a nice-to-have but a necessary condition for the contract-discipline that catches silent wiring failures. Migration path disclosed: Redis Streams (
XADD/XREADGROUPbehind transport abstraction) → Redpanda at the 5 → 12 repos / 100+ event types crossing point, with coordination capabilities (not throughput) as the named trigger. The OmniNode-side architecture content is canonicalised on its own page; this Seen-in framing is on the Redpanda affordances the migration depended on. -
sources/2025-06-20-redpanda-behind-the-scenes-redpanda-clouds-response-to-the-gcp-outage — 2025-06-20 production-incident retrospective on the 2025-06-12 GCP global outage. Canonicalises cell-based architecture as an explicit Redpanda Cloud product principle, with the single-binary broker as the cell-level unit. Discloses local-NVMe- primary + async-tiered-storage-secondary shape, deliberate disk reserve as reliability buffer, hedged observability stack (self-hosted data + third-party UI) that degraded rather than blinded, and feedback-control- loop-guarded phased rollouts as the discipline for closing change-management control loops. One affected cluster (staging,
us-central-1, single node lost, ~2 hours to replace) out of hundreds. SEV4 incident closed with no customer impact. Canonicalises the butterfly effect as a first-class system-design primitive from chaos theory. -
sources/2025-05-20-redpanda-implementing-fips-compliance-in-redpanda — 2025-05-20 FIPS-compliance configuration walkthrough. Canonicalises Redpanda's broker-level FIPS substrate on the wiki: OpenSSL 3.0.9 / 3.1.2 as the validated module; the
disabled/enabled/permissivetri-state dial; the startup fail-fast enforcement shape; the two-package artefact split; the RHELfips-mode-setupOS precondition; the Ansible Collectionenable_fips=trueopt-in. Opens the Redpanda security-substrate axis on the wiki. -
sources/2025-05-06-redpanda-a-guide-to-redpanda-on-kubernetes — 2025-05-06 product-altitude guide to Redpanda's Kubernetes deployment evolution: Helm chart vs Redpanda Operator trade-off; the two-to-one operator consolidation across internal Cloud and customer Self-Managed deployments; the FluxCD-bundling reversal (v2.3.x optional → v2.4.x default-off → v25.1.x removed) canonicalising the [[concepts/bundled-gitops-dependency- anti-pattern|bundled-GitOps-dependency anti-pattern]]; the v2.4.x → v25.1.x version jump adopting the version-aligned compatibility scheme (operator/chart version matches Redpanda core version, ±1 minor window). Pattern: [[patterns/unified- operator-for-cloud-and-self-managed]].
-
sources/2025-04-23-redpanda-need-for-speed-9-tips-to-supercharge-redpanda — 2025-04-23 omnibus performance-tuning checklist. Nine tips covering infrastructure + data architecture + application design. Canonicalises partition skew as Amdahl's Law; the four-parameter consumer fetch-tuning matrix; the save-button analogy for offset commits and RPO-as-commit-frequency link; ZSTD vs LZ4 codec trade-off; the compression+compaction CPU tax; and tiered storage as rebalance accelerator. Introduces concepts/keyed-partitioner, patterns/high-cardinality-partition-key, and patterns/client-side-compression-over-broker-compression. Extends the prior Kinley 2024-11 batch-tuning framing with the hardware-shortfall trigger for write caching.
-
sources/2025-04-03-redpanda-autonomy-is-the-future-of-infrastructure — Alex Gallego's founder-voice autonomy essay + Agents SDK launch
-
$100M Series D announcement. Canonicalises Redpanda as the durable-log substrate for enterprise agents; canonicalises Data Plane Atomicity as BYOC's central design tenet; canonicalises log-is-truth-database- is-cache as Redpanda's founding premise via Kleppmann's 2015 paper. Introduces systems/redpanda-agents-sdk, systems/redpanda-byoc, patterns/mcp-as-centralized-integration-proxy, and patterns/dynamic-content-filtering-in-mcp-pipeline.
-
sources/2025-02-11-redpanda-high-availability-deployment-multi-region-stretch-clusters — canonical wiki source for Redpanda's multi-region stretch- cluster HA/DR shape. RPO=0 + Raft re-election on region loss; replication-factor sets region-failure tolerance; four-knob mitigation matrix (leader pinning,
acks=1, follower fetching, remote read replica); region-as-rack Ansible deployment template; OMB +tcsimulation technique; stretch-cluster vs MirrorMaker2 async as the canonical consistency-vs-availability axis exposition. - sources/2025-01-21-redpanda-implementing-the-medallion-architecture-with-redpanda — canonical wiki source for Redpanda as lakehouse Bronze sink. Walks the Medallion Architecture three-tier pattern and positions Iceberg topics as the mechanism that makes the broker serve as the Bronze layer directly — no Kafka Connect cluster, no Python-on-Airflow job, no Redpanda Connect pipeline. Also names Flink's Iceberg sink connector as enabling real-time Bronze→Silver→Gold transitions via patterns/stream-processor-for-real-time-medallion-transitions. Tier-3 pedagogy altitude; no latency/throughput/cost numbers.
- sources/2024-11-26-redpanda-batch-tuning-in-redpanda-to-optimize-performance-part-2
— operations-manual companion by James Kinley (2024-11-26). The
canonical wiki source for (1) Redpanda's four Prometheus private
metrics (
vectorized_storage_log_written_bytes,vectorized_storage_log_batches_written,vectorized_scheduler_queue_length) + one public (redpanda_cpu_busy_seconds_total) for broker-side effective-batch-size observability; (2) the 4 KB NVMe page- alignment target floor for effective batch size (≥ 16 KB ideal); (3) write caching as the broker-side durability-relaxation feature (ack-on-memory + background flush, equivalent to legacy Kafka durability); (4) a real Redpanda Cloud BYOC customer retrospective showing three-round linger tuning dropping p99 128 ms → 17 ms and consolidating 2 Tier-7 clusters → 1 cluster at ~2.2× per-cluster throughput. - sources/2024-11-19-redpanda-batch-tuning-in-redpanda-for-optimized-performance-part-1 — first-principles batch-tuning explainer by James Kinley (2024-11-19). Canonicalises producer-side batching substrate on the wiki with the seven-factor effective-batch-size framework. Kafka-API-compatible framing applies identically to Apache Kafka.
- sources/2024-05-09-highscalability-kafka-101 — named among the three canonical alternative implementations of the Kafka API: "Notable competitors include RedPanda, which re-wrote Kafka in C++..." Paired with Kora (Confluent's cloud-native engine) and WarpStream (S3-heavy).
- sources/2025-03-18-redpanda-3-powerful-connectors-for-real-time-change-data-capture
— introduces the Redpanda Connect
integration layer shipping a family of CDC input connectors
(
postgres_cdc,mysql_cdc,mongodb_cdc,gcp_spanner_cdc) as the Kafka-Connect alternative. Canonicalises parallel snapshot of a single large table as the Redpanda differentiator against Debezium. Ecosystem composition datum: Redpanda Connect feeds Redpanda broker without a separate Kafka Connect cluster.
Related¶
- systems/kafka — upstream API Redpanda implements. Producer batching semantics shared.
- systems/confluent-kora — Confluent's cloud-native sibling.
- systems/warpstream — S3-heavy sibling; different architectural endpoint from Redpanda's thread-per-core broker shape.
- systems/prometheus — Redpanda exposes broker metrics via public + private Prometheus endpoints.
- systems/grafana — canonical visualisation layer for the effective-batch-size dashboard.
- systems/nvme-ssd — the 4 KB page-alignment substrate that dictates Redpanda's recommended minimum effective batch size.
- companies/redpanda — vendor.
- concepts/effective-batch-size — the seven-factor framework canonicalised from Redpanda's explainer.
- concepts/sticky-partitioner — Kafka-client partitioner behaviour.
- concepts/fixed-vs-variable-request-cost — substrate economics for batching.
- concepts/batching-latency-tradeoff — normal-vs-saturated regime.
- concepts/producer-backpressure-batch-growth — saturation inflates producer batches.
- concepts/broker-effective-batch-size-observability — the Prometheus-metric cookbook for measuring effective batch size.
- concepts/small-batch-nvme-write-amplification — why batches below 4 KB are catastrophic on NVMe.
- concepts/broker-write-caching — Redpanda's broker-side durability-relaxation feature.
- concepts/per-topic-batch-diagnosis — the aggregate-hides- offender observability discipline.
- patterns/batch-over-network-to-broker — canonical pattern both Kafka and Redpanda implement.
- patterns/prometheus-effective-batch-size-dashboard — the five-PromQL-query Grafana dashboard.
- patterns/iterative-linger-tuning-production-case — the three-round linger-tuning playbook canonicalised from Redpanda's BYOC customer case study.
- patterns/broker-write-caching-as-client-tuning-substitute — when to enable write caching rather than tune producers.
- systems/openmessaging-benchmark — the benchmark substrate Redpanda uses for multi-region performance testing.
- concepts/multi-region-stretch-cluster — Redpanda's region-spanning HA/DR shape via per-partition Raft groups.
- concepts/leader-pinning — write-path locality dial on stretch clusters (enterprise feature).
- concepts/follower-fetching — read-path locality (KIP-392).
- concepts/remote-read-replica-topic — object-storage-backed read-only mirror; zero load on origin brokers.
- concepts/mirrormaker2-async-replication — the async two-cluster alternative shape.
- concepts/cross-region-bandwidth-cost — the replication cost hazard on stretch clusters.
- patterns/multi-region-raft-quorum — the canonical pattern for stretch-cluster replication.
- patterns/client-proximal-leader-pinning — the pattern leader pinning realises.
- patterns/closest-replica-consume — the pattern follower fetching realises.
- patterns/tc-latency-injection-for-geo-simulation — the simulation technique Redpanda uses for stretch-cluster benchmarks.
- systems/redpanda-connect — the ~300-connector integration layer.
- systems/redpanda-byoc — the Bring Your Own Cloud deployment model Redpanda's agent infrastructure depends on.
- systems/redpanda-agents-sdk — the 2025-04-03 agents SDK built on top of the broker as durable-log substrate.
- systems/model-context-protocol — the integration-layer
standard Redpanda Connect exposes via
rpk connect mcp-server. - concepts/data-plane-atomicity — the BYOC design tenet.
- concepts/log-as-truth-database-as-cache — Redpanda's founding premise (Kleppmann 2015).
- concepts/autonomy-enterprise-agents — the 2025 positioning of Redpanda as agent-era infrastructure.
- concepts/durable-execution — the agent-workflow property the broker underwrites.
-
patterns/mcp-as-centralized-integration-proxy — the Redpanda Connect MCP-server shape.
-
sources/2024-12-03-redpanda-redpanda-243-extends-lakehouses-with-streaming-data-cdc — Redpanda 24.3 release roundup (2024-12-03). Origin-point source for multiple primitives the wiki now tracks: Iceberg Topics beta (self-managed Enterprise + Redpanda Cloud BYOC — GA follow-up at 25.1, 2025-04-07); Mountable Topics — zero-data-loss unmount/remount of unused tiered- storage topics via
rpkCLI, canonicalised as concepts/mountable-tiered-storage-topic + patterns/hibernate-unused-topics-on-tiered-storage; leader pinning + follower fetching as multi-region / multi-AZ locality dials, announced as duals;postgres_cdcbeta in Redpanda Connect framed as "optimized for Redpanda Connect's native Go (vs. Debezium's Java)" and "the beginning of a larger CDC effort"; Redpanda Migrator gains offset translation (pre-Shadowing answer to cross-cluster consumer failover);rpk connect --secretsfor runtime secret interpolation from AWS Secrets Manager / Azure Key Vault / GCP Secret Manager / Redis (patterns/external-secrets-manager-interpolation); plus Azure Marketplace launch + Customer-Managed VNets on Azure for BYOC + 99.99% uptime SLA + Terraform provider public beta.