Skip to content

PATTERN Cited by 1 source

Unified operator for cloud and self-managed

Problem

A vendor offering both a managed cloud service and a customer-Self-Managed deployment of the same product on Kubernetes faces a choice for its Kubernetes Operator: ship one, ship two, or share code across two.

Shipping two diverges quickly. The internal Cloud team has requirements the customer operator doesn't need (multi-tenant fleet management, cross-cluster orchestration, billing integration, telemetry to the vendor's observability stack). The customer-facing operator has requirements the internal team doesn't need (support for customer-managed network policies, pluggable logging, integration with customer-owned secret managers). The two operators drift apart over time, doubling the reconciliation-logic maintenance burden with no offsetting wins.

Shipping one — a unified operator — is the target state but requires the vendor to pay a non-trivial consolidation cost upfront.

Solution

Build one Kubernetes Operator codebase that serves both the vendor's managed cloud deployment and the customer's Self-Managed deployment, with capabilities gated on deployment mode. The shared core handles the product's lifecycle (install, upgrade, reconcile, health, backup); mode-specific extensions handle cloud-only capabilities (fleet orchestration, billing hooks) or Self-Managed-only capabilities (custom network-policy injection).

Canonical wiki instance

Redpanda's 2025 unified Redpanda Operator (v25.1.x) is the canonical wiki instance. Prior state: Redpanda shipped two operators — an internal Cloud operator for Redpanda Cloud / BYOC, and a separate customer-facing operator for Self-Managed deployments. The customer operator bundled FluxCD as an internal dependency (the bundled-GitOps-dependency anti-pattern), which made merging it with the internal operator hard. Unification required three preparatory moves:

  1. Make FluxCD optional (v2.3.x — spec.chartRef.useFlux).
  2. Disable FluxCD by default (v2.4.x, Jan 2025).
  3. Remove FluxCD and the Helm-chart wrapping entirely (v25.1.x beta). At this point the customer operator's internals look much more like the internal Cloud operator's, and the two can be merged.

(Source: sources/2025-05-06-redpanda-a-guide-to-redpanda-on-kubernetes)

Framing verbatim: "This paved the way for a unified operator across our Cloud and our Self-Managed customers while simplifying the codebase tremendously, which also makes it easier to add new features down the road."

Preconditions

  • Managed cloud runs on Kubernetes. If the vendor's cloud uses a separately-architected substrate (VMs, bare metal, serverless), unification is moot.
  • Reconciliation logic generalises. The core lifecycle operations (create cluster, rolling upgrade, scale, failover, backup) must be expressible with the same reconciliation model for both deployment modes. If cloud operations fundamentally differ (e.g. cloud uses a pull-based config-drift model, Self-Managed uses push-based admin webhooks), unification is forced to paper over the difference.
  • Anti-pattern deps removed. Bundled GitOps controllers, heavy Helm-chart-wrapping, or other structural divergence between cloud and Self-Managed shapes must be cleaned up first.
  • Product-management alignment. Features Cloud needs (metering, tenancy, cross-cluster orchestration) must not be ruled out of the shared codebase as "customer doesn't need these".

Trade-offs

Wins

  • Halved reconciliation-logic maintenance surface.
  • Feature parity by construction — features shipped for Cloud appear for Self-Managed too (and vice versa) unless gated.
  • Easier migration between cloud and Self-Managed deployments. Same operator, same CRDs, same upgrade semantics.
  • Single testing matrix for the product's K8s substrate.

Costs

  • Consolidation cost upfront. The work of merging two mature codebases is substantial — has to be justified against opportunity cost.
  • Cloud-only features leak into Self-Managed. Customers see CRD fields or flags that don't apply to them; feature-gating discipline becomes mandatory.
  • Release coupling. The operator's release cadence now gates both Cloud rollouts and Self-Managed releases. A Cloud-urgent fix ships to every Self-Managed customer too.
  • Security boundary homogenisation. Cloud operator runs in the vendor's trust domain; Self-Managed runs in the customer's. Shared code must not assume vendor-only capabilities (e.g. access to the vendor's control plane).

Alternative approaches

  • Two operators, shared libraries. Keep both operators as separate binaries but extract common reconciliation logic into a library both depend on. Retains deployment-mode-specific tuning; keeps binaries small; higher library-interface discipline cost.
  • One operator, cloud-only; Self-Managed ships Helm chart only. Used by vendors whose Self-Managed audience is smaller and tolerates lower operational automation. The Helm chart is the Self-Managed path.
  • Open-source base + proprietary cloud extension. The vendor open-sources the Self-Managed operator and builds cloud-specific features as a closed-source fork or layer on top.
Last updated · 470 distilled / 1,213 read