Skip to content

SYSTEM Cited by 14 sources

Kubernetes

Kubernetes is the dominant open-source container orchestrator (CNCF graduated, originally from Google's Borg lineage). Stub page — expand as sources cite specific subsystems.

Relevant subcomponents already on this wiki: - systems/kube-proxy — default L4 service load balancer (iptables / IPVS / eBPF). - systems/coredns — cluster DNS for service name resolution. - Services / EndpointSlices — API objects that expose a set of pods; watched by control planes like Databricks' systems/databricks-endpoint-discovery-service.

Default service networking (for context)

  1. Client resolves svc-name.namespace.svc.cluster.local via CoreDNS → returns a ClusterIP (virtual IP).
  2. Packet hits the node; kernel rules (configured by kube-proxy) rewrite the dst to one of the pod IPs per basic L4 policy (round-robin etc.).
  3. Pod replies to the client.

This pattern has known limits for L7 protocols with long-lived connections (gRPC, HTTP/2 streaming) — see concepts/layer-7-load-balancing and patterns/proxyless-service-mesh.

Seen in

  • sources/2025-10-01-databricks-intelligent-kubernetes-load-balancing — Databricks runs hundreds of stateless gRPC services per cluster and thousands of clusters across multiple regions; their intelligent LB work explicitly addresses the default model's limitations.
  • sources/2025-08-06-allthingsdistributed-removing-friction-sagemaker-ai-developmentsystems/aws-sagemaker-hyperpod training operator (custom K8s operator) improves on default Kubernetes Job fault-recovery semantics for distributed GPU training: restart only the affected resources instead of the whole job (see patterns/partial-restart-fault-recovery); monitors for stalled batches and non-numeric loss; teams codify recovery policies via YAML.
  • sources/2024-08-08-figma-migrated-onto-k8s-in-less-than-12-months — Figma migrated their compute platform from ECS to EKS (managed Kubernetes) in under 12 months. Enumerated motivations for choosing Kubernetes over staying on ECS: StatefulSets for stateful workloads like etcd; Helm charts for OSS like systems/temporal; graceful node cordon-and-drain; CNCF auto-scaling (systems/keda, systems/karpenter); service-mesh ecosystem; reduced vendor lock-in; easier hiring. Operational shape: three active clusters per environment (patterns/multi-cluster-active-active-redundancy) proven in a CoreDNS-destruction incident that cost 1/3 of requests instead of full outage.
  • sources/2026-03-18-aws-ai-powered-event-response-for-amazon-eks — Kubernetes as the discovery target for AWS DevOps Agent. The agent queries the Kubernetes API as one of two discovery paths (static resource state + OTel-derived runtime relationships) to build a live dependency graph for incident investigation. Graph nodes are standard K8s API objects — Pods / Deployments / Services / ConfigMaps / Ingress / NetworkPolicies — carrying their full label / annotation / resource-spec / health-check / env-var metadata. See concepts/telemetry-based-resource-discovery for how the static graph fuses with runtime telemetry.
  • sources/2026-03-23-aws-generali-malaysia-eks-auto-mode — Kubernetes under the EKS Auto Mode (managed-data-plane) variant at Generali Malaysia. Canonical wiki reference for the compound K8s operating discipline on managed-cluster services: stateless-only pods + pods treated as immutable + Helm as standardised deployment mechanism + HPA traffic-driven auto-scaling. Managed-cluster upgrade churn (Bottlerocket AMI replacement weekly) is mediated by Pod Disruption Budgets + Node Disruption Budgets + off-peak maintenance windows (patterns/disruption-budget-guarded-upgrades). Peer-AWS- service integration surface: GuardDuty / Inspector / Network Firewall / Secrets Manager + External Secrets Operator / Managed Grafana.
  • sources/2026-04-06-aws-unlock-efficient-model-deployment-simplified-inference-operator-setup-on-amazon-sagemaker-hyperpod — Kubernetes as the compilation target for LLM-inference deployment primitives. The 2026-04-06 HyperPod Inference Operator post names two mechanisms that compile down to Kubernetes-native primitives: (1) multi-instance-type fallback expressed as a prioritised list (["ml.p4d.24xlarge", "ml.g5.24xlarge", "ml.g5.8xlarge"]) on a CRD, implemented via requiredDuringSchedulingIgnoredDuringExecution to restrict the set + preferredDuringSchedulingIgnoredDuringExecution with descending weights to order within it — see concepts/instance-type-fallback; (2) the raw nodeAffinity surface is exposed directly in InferenceEndpointConfig for custom scheduling (excluding Spot, preferring AZs, custom labels). Two first-class CRDs under inference.sagemaker.aws.amazon.com/v1: InferenceEndpointConfig (bring-your-own-model) and JumpStartModel (managed model catalog). Verification example: kubectl get pods -n hyperpod-inference-system.
  • — Brian Morrison II, 2023-09-27. Kubernetes as the substrate for stateful database-as-a-service at fleet scale. PlanetScale runs "hundreds of thousands of databases" as Vitess clusters on Kubernetes, orchestrated by the custom PlanetScale Vitess Operator (concepts/kubernetes-operator-pattern). Canonicalises the operator-over-StatefulSet architectural choice (patterns/custom-operator-over-statefulset): plain pods with direct cloud PVC attachment instead of StatefulSets, because VTGate already handles pod- identity via the topology server. Enforces minimum 3 AZs per region for paid production databases (patterns/multi-az-vitess-cluster) — "we don't even support cloud regions with less than three availability zones."
  • sources/2026-04-07-yelp-zero-downtime-cassandra-4x-upgradeKubernetes as the rolling-upgrade substrate for a gossip-based NoSQL fleet. Yelp runs > 1,000 Cassandra nodes on Kubernetes via a custom Cassandra operator (architecture documented in an earlier 2020 "Orchestrating Cassandra on Kubernetes with operators" post). Canonical wiki instance of Kubernetes init containers used to sequence two simultaneous state changes (new pod IP + new Cassandra version) into two distinct gossip-observable events — the init container runs the old (3.11) version on the new IP just long enough for gossip to converge, then flips to the new (4.1) main container; see concepts/init-container-ip-gossip-pre-migration and CASSANDRA-19244. Presented at KubeCon 2025. Companion pattern: patterns/pre-flight-flight-post-flight-upgrade-stages — three-stage automation script (kubectl + CLI + PR creation) runs in auto-proceed or per-step-confirmation mode. — Kubernetes as a latency-variability substrate for latency-sensitive middleware (PgBouncer connection pooling). Zalando's Postgres Operator team enumerates three Kubernetes-native latency hazards and their operator-level fixes: (1) kube-proxy iptables non-uniformity producing 977/995/585/993 m CPU skew across four PgBouncer pods; (2) sibling-HT softirq contention doubling p99 latency when two PgBouncer so_reuseport workers land on the same physical core; (3) cross-AZ network RTT from availability-zone-spread pods. Mitigation path: CPU Manager static policy for exclusive physical-core pinning, plus the

big-pooler-with-affinity + small-pooler-HA escape hatch when operator-provided defaults aren't enough. Companion operator page: zalando postgres operator. First wiki canonicalisation of Kubernetes as a platform for latency-sensitive stateful-adjacent workloads — complements the Databricks gRPC-skew and PlanetScale database-fleet Seen-ins with a connection-pooler altitude. - Kubernetes as the substrate for an end-to-end load-test automation platform. Zalando Payments' Load Test Conductor orchestrates Kubernetes Deployments + AWS ECS services simultaneously; Kubernetes CronJob is the scheduler for recurring load-test runs (see patterns/scheduled-cron-triggered-load-test); the test-cluster → NodePool-A (load infra) + NodePool-B (services under test) layout demonstrates node-pool-based segregation of load-generators from system-under-test to prevent the generator's resource needs from starving the system being measured. The CronJob-triggered-load-test pattern adds a canonicalisation point for Kubernetes scheduling primitives in a test-automation context. - Kubernetes as the substrate for scheduled cron-based scaling of a stateful zone-aware workload. Zalando Lounge runs a 3-AZ Elasticsearch cluster on Kubernetes via the custom es-operator against an EDS CRD with StatefulSet highest-ordinal scale-in semantics underneath. Cron-based scale-in/scale-out (patterns/scheduled-cron-based-scaling) mutates the EDS replicas field; es-operator reconciles. The incident catalogs four failure modes: (1) zone-aware drain livelock when the highest-ordinal pod is alone in an AZ; (2) ctx-cancellation ignored in one retry loop in the drain code (morning scale-out could not preempt stuck nightly scale-in); (3) zombie cluster.routing.allocation.exclude._ip entries when drain is interrupted between mark and cleanup; (4) organizational blast- radius drift — the "quick fix" missed an experimental project's separate scale-down cronjob. Canonical wiki instance of a Kubernetes 1.28 control-plane upgrade silently altering pod- to-zone distribution under ephemeral-storage cross-zone drift, invalidating the implicit assumption of the scaling plan. Closing lesson: "Read the code." - sources/2025-02-16-zalando-scaling-beyond-limits-harnessing-route-server-for-a-stable-clusterKubernetes API + etcd as the scaling bottleneck for a large ingress data plane. At ~180 Skipper pods per cluster × 200 clusters, each independently polling the API for Ingress + RouteGroup, etcd was "overwhelmed" and API- server CPU was throttled — "our clusters lost the ability to schedule new pods effectively." The failure surface is the scheduler, not the request path. Canonical wiki instance of concepts/control-plane-fan-out-to-kubernetes-api. Remediation: insert Route Server as a single coalescing proxy between Skipper and the API (patterns/control-plane-proxy-with-etag-cache) that polls the API every 3 s and serves Skippers with HTTP ETag / 304 (concepts/etag-conditional-polling). Zalando explicitly rejected Kubernetes Informers as the alternative: an informer-based design keeps the N× fan-out shape at change events — "Since it's a sudden increase in traffic and HPA won't be able to catch up and scale Kubernetes API and etcd." Rolled out via FalsePre (shadow diff of routing tables) → Exec flag (patterns/three-mode-rollout-off-shadow-exec), tier by tier; zero GMV loss; Skipper HPA ceiling raised to 300 pods per cluster.

Go binary-size story

Kubernetes is both a victim and a beneficiary of the Go binary-size engineering documented in sources/2026-02-18-datadog-how-we-reduced-agent-go-binaries-up-to-77-percent:

Separately, systems/containerd (the default K8s container runtime) was the root cause of the Agent's 245-MiB plugin- import regression — Datadog's upstream containerd build-tag fix propagates to every Go program using containerd, directly or transitively.

Nodeless variants via Virtual Kubelet

Not every managed-K8s product uses real Nodes. Fly.io's Fly Kubernetes (FKS) runs K3s as the API plane and plugs in a Virtual Kubelet provider that forwards every Pod-create request into Fly Machines (Firecracker micro-VMs). There is no Node object in the cluster at all — see concepts/nodeless-kubernetes and concepts/micro-vm-as-pod. Fly maps the rest of the K8s primitives to existing Fly.io primitives 1:1 (patterns/primitive-mapping-k8s-to-cloud): containerd/CRI → flyd + Firecracker + Fly init; CNI → internal IPv6 WireGuard mesh; Services → the Fly Proxy; CoreDNS retained at beta. Source: sources/2024-03-07-flyio-fly-kubernetes-does-more-now. Same pattern underlies AWS Fargate-on-EKS and Azure AKS Virtual Nodes; the wiki's canonical K8s + Virtual-Kubelet example is FKS.

Seen in (Kubernetes as Firecracker-µVM substrate for AI-agent execution)

  • sources/2026-04-24-atlassian-rovo-dev-driven-developmentAtlassian Fireworks (2026-04-24) is a canonical wiki instance of K8s as the scheduling / networking / declarative-API substrate under Firecracker micro-VMs — the same architectural move systems/fly-kubernetes makes in public cloud, applied internally for AI-agent workload execution. The engineering composition it enables: accept OCI container images via the K8s-shaped API surface, boot them as hardware-isolated Firecracker VMs instead of shared-kernel containers, and layer in per-µVM eBPF network-policy enforcement, Envoy ingress, Raft persistence for scheduler state, and node agents that manage VM lifecycle (100ms warm starts, live migration, snapshot filesystem restore, sidecar sandboxes). Canonicalised as concepts/hardware-isolated-microvm-on-kubernetes. First wiki disclosure that K8s-at-Atlassian runs on the "shared AWS scms Kubernetes cluster" and that per-developer dev shards on this cluster are the canonical integration-test substrate for AI-agent-written code — the K8s primitive the agentic development workflow rides on.
Last updated · 542 distilled / 1,571 read