Skip to content

REDPANDA 2025-05-06

Read original ↗

Redpanda — A guide to Redpanda on Kubernetes

Summary

Redpanda (unsigned, 2025-05-06) publishes a product-altitude guide to the evolution of Redpanda's Kubernetes deployment story — from an early Helm chart, through two separate Kubernetes Operators (internal Cloud + customer-facing Self-Managed), to a unified Redpanda Operator being rolled out across 2025 that serves both audiences. Frames the operator-vs-Helm trade-off; discloses the FluxCD bundling mistake and its reversal; canonicalises a version-aligned compatibility scheme (operator / Helm chart version = Redpanda-core version); and positions the unified operator as the canonical production-grade deployment path going forward.

Batch-skip override per explicit user full-ingest instruction. Raw frontmatter had carried ingested: true + skip_reason: batch-skip — marketing/tutorial slug pattern. Full ingest on architectural- substance grounds: the post contains three load-bearing architectural claims (FluxCD-bundling-is-anti-pattern; two-operators-to-one-operator consolidation; version-alignment-retires-compatibility-matrix) that are structurally useful for the wiki's Kubernetes-operator corpus beyond Redpanda.

Key takeaways

  • Redpanda on Kubernetes has two supported deployment paths: Helm chart (simple, limited lifecycle automation) or Redpanda Operator (production-grade, CRD-driven, handles upgrades + dynamic config + lifecycle). Operator is the default recommendation.
  • Until 2024, there were TWO separate Redpanda Operators: one for internal Redpanda Cloud operations, and a separate customer-facing operator for Self-Managed deployments. Maintaining divergent operators became a cost; the 2025 unification effort merges them. "Currently, we have two Redpanda Operators: one for us and one for customers — but it's about to get a whole lot simpler."
  • The customer-focused operator initially bundled FluxCD — a GitOps controller similar to ArgoCD — which wrapped Redpanda's Helm chart internally to accelerate operator development. This turned into an anti-pattern: "customers were confused — bundling made our initial version of the customer-facing operator quite different from others in the Kubernetes ecosystem. Additionally, some customers were already using FluxCD, which conflicted with our bundled version." The fix: remove FluxCD as a hard dependency. (Source: sources/2025-05-06-redpanda-a-guide-to-redpanda-on-kubernetes)
  • Three-branch rollout for the unification:
  • v2.3.x — FluxCD optional. spec.chartRef.useFlux toggles it.
  • v2.4.x — FluxCD disabled by default (Jan 2025). Same toggle.
  • v25.1.x — no FluxCD, no Helm-chart dependency. Beta. Version scheme now tracks Redpanda core version.
  • The version number jump from v2.4.x to v25.1.x is deliberate: signals the new version-aligned compatibility scheme. Operator/Helm chart version number matches the Redpanda-core version — "determine compatibility at a glance without checking a compatibility chart." Each operator/chart version is also compatible with the Redpanda version immediately above and below it (±1 minor window). (Source: sources/2025-05-06-redpanda-a-guide-to-redpanda-on-kubernetes)
  • Five-axis Operator vs Helm comparison disclosed verbatim: managed upgrades + rollback, dynamic configuration (via CRDs — Helm cannot do runtime config without redeployment), advanced health checks / metrics, lifecycle automation (scaling, failover, cleanup), multi-tenancy management (one CRD vs separate Helm releases). Helm still supported for teams wanting a simple deploy path and template-based configuration.
  • Limitation on K8s-only deployment shapes: per the prior 2025-02-11 stretch-cluster post, "Self-Managed on K8s currently supports only multi-AZ deployments in all the cloud providers" — multi-region stretch clusters are not supported on K8s. This guide doesn't revisit that limitation.

Systems introduced or extended

  • systems/redpanda-operator — new canonical page. The customer-facing (and, as of 25.1, unified) Kubernetes Operator for Redpanda. Manages lifecycle of Redpanda clusters via CRDs; handles rolling upgrades with reconciliation, dynamic configuration, per-cluster health + metrics, scaling, failover, cleanup, multi-tenancy. Historically depended on FluxCD + Redpanda's Helm chart internally; v25.1.x drops both dependencies.
  • systems/fluxcd — new minimal system page. CNCF GitOps controller (similar to systems/argocd); Redpanda's customer operator used it internally to wrap Helm-chart lifecycle management before 25.1 removed the dependency.
  • Redpanda — extended with a new "Kubernetes deployment" section canonicalising the Helm-vs-Operator split and the 25.1 FluxCD-removal + version-alignment changes.
  • systems/helm — extended with a new Seen in entry canonicalising Redpanda's Helm chart as the simpler-but-less- automated alternative to the Redpanda Operator, plus the operator-wrapping-Helm-chart pattern and its eventual removal in v25.1.x.

Concepts canonicalised

  • concepts/bundled-gitops-dependency-anti-pattern — new canonical page. Bundling a GitOps controller (e.g. FluxCD, ArgoCD) as a hidden dependency of a Kubernetes Operator makes the operator diverge from ecosystem norms, conflicts with customers' own GitOps installations, and couples the operator's release cadence to the bundled tool's. Redpanda's customer operator canonicalises this pattern with its 2024-era reversal: FluxCD made operator development faster but produced customer confusion and conflict with pre-existing FluxCD deployments; the fix was to make the bundled tool optional, then disabled by default, then removed entirely.
  • concepts/version-aligned-compatibility-scheme — new canonical page. Versioning a support tool (operator, Helm chart, client library, migration tool) such that its version number matches the underlying system's version is the explicit retirement of a compatibility-matrix document. Redpanda 25.1 canonicalises this: "This means you can now determine compatibility at a glance without checking a compatibility chart. And, each operator and Helm chart version will also be compatible with the Redpanda version above and below it." Trade-off: forces a version-number discontinuity when adopting the scheme (v2.4.x → v25.1.x jump).
  • concepts/kubernetes-operator-pattern — extended with a new Seen-in canonicalising the Redpanda-Operator production trajectory (Helm-chart-wrapping → FluxCD-optional → FluxCD-removed → version-aligned).
  • concepts/custom-resource-definition (referenced) — CRDs are the mechanism the Redpanda Operator uses for dynamic configuration, distinguishing it from the Helm chart (which requires redeploy on every config change).

Patterns canonicalised

  • patterns/unified-operator-for-cloud-and-self-managed — new canonical page. A single Kubernetes Operator codebase serving both the vendor's managed-cloud deployment and the customer's self- managed deployment. Avoids the cost of maintaining two divergent operators with largely overlapping reconciliation logic. Redpanda's 2025 consolidation is the canonical wiki instance. Precondition: the vendor's managed cloud runs on Kubernetes (not a separately architected substrate), and the operator's reconcile logic generalises across managed and self-managed shapes.

Operational numbers

  • Three operator branches active at publication: v2.3.x (FluxCD optional), v2.4.x (FluxCD disabled by default, Jan 2025), v25.1.x (no FluxCD, beta).
  • ±1 minor compatibility window on the version-aligned scheme — e.g. v25.1.x Operator supports Redpanda 24.3.x, 25.1.x, 25.2.x.
  • Three cloud providers supported for Kubernetes deployment: AWS, Google, Azure.

Caveats

  • Product-guide altitude. Body is a product tour of the Redpanda Kubernetes deployment landscape, not an architectural retrospective or incident write-up. Mechanism depth is intentionally limited — the post does not walk the operator's reconcile loop, CRD schema, webhook admission pattern, or leader-election mechanism.
  • No production numbers. No disclosure of how many customers run the operator vs Helm chart; no fleet distribution of FluxCD- bundled-operator issues vs normal adoption; no upgrade-success percentiles; no reconcile-loop timings.
  • "Confusion" metric unquantified. The structural claim that FluxCD bundling "confused customers" and conflicted with existing deployments is asserted without frequency or support-ticket data. Structural plausibility is high but the magnitude of the pain is qualitative.
  • FluxCD removal migration path underspecified. Customers running on v2.3.x with FluxCD enabled have to migrate; the post gestures at docs but doesn't enumerate the steps, downtime window, or rollback posture.
  • Deprecation schedule opaque. "At some point, the 2.3.x and 2.4.x branches will be deprecated, but there's no date for this yet" — leaves customers with a planning gap.
  • Unified-operator cutover mechanism unspecified. The merge between internal Cloud operator and customer operator is framed as simplification but the substrate of that merge (which code paths survive? which are rewritten? what's the cross-team review process?) is not disclosed.
  • Tier-3 product-marketing framing. "Say hello to simplified versioning" + CTA close ("Redpanda University", "upcoming free course") are vendor-blog conventions; substance is the architectural trajectory beneath.
  • Comparison table is vendor-framed. The five-axis Helm vs Operator comparison is correct in broad strokes but Helm's automation ceiling depends on tooling layered on top (Helmfile, ArgoCD with Helm-chart apps, FluxCD's HelmRelease CRD); post doesn't explore these alternatives.
  • Multi-region K8s limitation not revisited. Prior stretch-cluster post disclosed "K8s currently supports only multi-AZ"; this deployment guide doesn't note whether the unified operator changes that substrate constraint.

Source

Last updated · 470 distilled / 1,213 read