SYSTEM Cited by 14 sources
Kubernetes¶
Kubernetes is the dominant open-source container orchestrator (CNCF graduated, originally from Google's Borg lineage). Stub page — expand as sources cite specific subsystems.
Relevant subcomponents already on this wiki: - systems/kube-proxy — default L4 service load balancer (iptables / IPVS / eBPF). - systems/coredns — cluster DNS for service name resolution. - Services / EndpointSlices — API objects that expose a set of pods; watched by control planes like Databricks' systems/databricks-endpoint-discovery-service.
Default service networking (for context)¶
- Client resolves
svc-name.namespace.svc.cluster.localvia CoreDNS → returns aClusterIP(virtual IP). - Packet hits the node; kernel rules (configured by kube-proxy) rewrite the dst to one of the pod IPs per basic L4 policy (round-robin etc.).
- Pod replies to the client.
This pattern has known limits for L7 protocols with long-lived connections (gRPC, HTTP/2 streaming) — see concepts/layer-7-load-balancing and patterns/proxyless-service-mesh.
Seen in¶
- sources/2025-10-01-databricks-intelligent-kubernetes-load-balancing — Databricks runs hundreds of stateless gRPC services per cluster and thousands of clusters across multiple regions; their intelligent LB work explicitly addresses the default model's limitations.
- sources/2025-08-06-allthingsdistributed-removing-friction-sagemaker-ai-development — systems/aws-sagemaker-hyperpod training operator (custom K8s operator) improves on default Kubernetes Job fault-recovery semantics for distributed GPU training: restart only the affected resources instead of the whole job (see patterns/partial-restart-fault-recovery); monitors for stalled batches and non-numeric loss; teams codify recovery policies via YAML.
- sources/2024-08-08-figma-migrated-onto-k8s-in-less-than-12-months — Figma migrated their compute platform from ECS to EKS (managed Kubernetes) in under 12 months. Enumerated motivations for choosing Kubernetes over staying on ECS: StatefulSets for stateful workloads like etcd; Helm charts for OSS like systems/temporal; graceful node cordon-and-drain; CNCF auto-scaling (systems/keda, systems/karpenter); service-mesh ecosystem; reduced vendor lock-in; easier hiring. Operational shape: three active clusters per environment (patterns/multi-cluster-active-active-redundancy) proven in a CoreDNS-destruction incident that cost 1/3 of requests instead of full outage.
- sources/2026-03-18-aws-ai-powered-event-response-for-amazon-eks — Kubernetes as the discovery target for AWS DevOps Agent. The agent queries the Kubernetes API as one of two discovery paths (static resource state + OTel-derived runtime relationships) to build a live dependency graph for incident investigation. Graph nodes are standard K8s API objects — Pods / Deployments / Services / ConfigMaps / Ingress / NetworkPolicies — carrying their full label / annotation / resource-spec / health-check / env-var metadata. See concepts/telemetry-based-resource-discovery for how the static graph fuses with runtime telemetry.
- sources/2026-03-23-aws-generali-malaysia-eks-auto-mode — Kubernetes under the EKS Auto Mode (managed-data-plane) variant at Generali Malaysia. Canonical wiki reference for the compound K8s operating discipline on managed-cluster services: stateless-only pods + pods treated as immutable + Helm as standardised deployment mechanism + HPA traffic-driven auto-scaling. Managed-cluster upgrade churn (Bottlerocket AMI replacement weekly) is mediated by Pod Disruption Budgets + Node Disruption Budgets + off-peak maintenance windows (patterns/disruption-budget-guarded-upgrades). Peer-AWS- service integration surface: GuardDuty / Inspector / Network Firewall / Secrets Manager + External Secrets Operator / Managed Grafana.
- sources/2026-04-06-aws-unlock-efficient-model-deployment-simplified-inference-operator-setup-on-amazon-sagemaker-hyperpod
— Kubernetes as the compilation target for LLM-inference
deployment primitives. The 2026-04-06 HyperPod Inference
Operator post names two mechanisms that compile down to
Kubernetes-native primitives: (1) multi-instance-type
fallback expressed as a prioritised list
(
["ml.p4d.24xlarge", "ml.g5.24xlarge", "ml.g5.8xlarge"]) on a CRD, implemented viarequiredDuringSchedulingIgnoredDuringExecutionto restrict the set +preferredDuringSchedulingIgnoredDuringExecutionwith descending weights to order within it — see concepts/instance-type-fallback; (2) the rawnodeAffinitysurface is exposed directly inInferenceEndpointConfigfor custom scheduling (excluding Spot, preferring AZs, custom labels). Two first-class CRDs underinference.sagemaker.aws.amazon.com/v1:InferenceEndpointConfig(bring-your-own-model) andJumpStartModel(managed model catalog). Verification example:kubectl get pods -n hyperpod-inference-system. - — Brian Morrison II, 2023-09-27. Kubernetes as the substrate for stateful database-as-a-service at fleet scale. PlanetScale runs "hundreds of thousands of databases" as Vitess clusters on Kubernetes, orchestrated by the custom PlanetScale Vitess Operator (concepts/kubernetes-operator-pattern). Canonicalises the operator-over-StatefulSet architectural choice (patterns/custom-operator-over-statefulset): plain pods with direct cloud PVC attachment instead of StatefulSets, because VTGate already handles pod- identity via the topology server. Enforces minimum 3 AZs per region for paid production databases (patterns/multi-az-vitess-cluster) — "we don't even support cloud regions with less than three availability zones."
- sources/2026-04-07-yelp-zero-downtime-cassandra-4x-upgrade
— Kubernetes as the rolling-upgrade substrate for a
gossip-based NoSQL fleet. Yelp runs > 1,000
Cassandra nodes on
Kubernetes via a custom Cassandra operator (architecture
documented in an earlier 2020 "Orchestrating Cassandra
on Kubernetes with operators" post). Canonical wiki
instance of
Kubernetes init containers used to sequence two
simultaneous state changes (new pod IP + new Cassandra
version) into two distinct gossip-observable events —
the init container runs the old (3.11) version on the
new IP just long enough for gossip to converge, then
flips to the new (4.1) main container; see
concepts/init-container-ip-gossip-pre-migration
and
CASSANDRA-19244. Presented at KubeCon 2025. Companion pattern: patterns/pre-flight-flight-post-flight-upgrade-stages — three-stage automation script (kubectl+ CLI + PR creation) runs in auto-proceed or per-step-confirmation mode. — Kubernetes as a latency-variability substrate for latency-sensitive middleware (PgBouncer connection pooling). Zalando's Postgres Operator team enumerates three Kubernetes-native latency hazards and their operator-level fixes: (1) kube-proxy iptables non-uniformity producing 977/995/585/993 m CPU skew across four PgBouncer pods; (2) sibling-HT softirq contention doubling p99 latency when two PgBouncerso_reuseportworkers land on the same physical core; (3) cross-AZ network RTT from availability-zone-spread pods. Mitigation path: CPU Manager static policy for exclusive physical-core pinning, plus the
big-pooler-with-affinity + small-pooler-HA escape
hatch when operator-provided defaults aren't enough.
Companion operator page:
zalando postgres operator. First wiki
canonicalisation of Kubernetes as a platform for
latency-sensitive stateful-adjacent workloads —
complements the Databricks gRPC-skew and PlanetScale
database-fleet Seen-ins with a connection-pooler
altitude.
-
— Kubernetes as the substrate for an end-to-end load-test
automation platform. Zalando Payments' Load Test Conductor
orchestrates Kubernetes Deployments + AWS ECS services
simultaneously; Kubernetes CronJob is the scheduler for
recurring load-test runs (see
patterns/scheduled-cron-triggered-load-test); the
test-cluster → NodePool-A (load infra) + NodePool-B (services
under test) layout demonstrates node-pool-based segregation
of load-generators from system-under-test to prevent the
generator's resource needs from starving the system being
measured. The CronJob-triggered-load-test pattern adds a
canonicalisation point for Kubernetes scheduling primitives
in a test-automation context.
-
— Kubernetes as the substrate for scheduled cron-based
scaling of a stateful zone-aware workload. Zalando Lounge
runs a 3-AZ Elasticsearch cluster on Kubernetes via the custom
es-operator against an EDS
CRD with StatefulSet
highest-ordinal scale-in semantics underneath. Cron-based
scale-in/scale-out (patterns/scheduled-cron-based-scaling)
mutates the EDS replicas field; es-operator reconciles. The
incident catalogs four failure modes: (1) zone-aware
drain livelock when the highest-ordinal pod is alone in an AZ;
(2) ctx-cancellation
ignored in one retry loop in the drain code (morning scale-out
could not preempt stuck nightly scale-in); (3) zombie
cluster.routing.allocation.exclude._ip entries when drain is
interrupted between mark and cleanup; (4) organizational blast-
radius drift — the "quick fix" missed an experimental project's
separate scale-down cronjob. Canonical wiki instance of a
Kubernetes 1.28 control-plane upgrade silently altering pod-
to-zone distribution under
ephemeral-storage cross-zone drift, invalidating the implicit
assumption of the scaling plan. Closing lesson: "Read the code."
- sources/2025-02-16-zalando-scaling-beyond-limits-harnessing-route-server-for-a-stable-cluster
— Kubernetes API + etcd as the scaling bottleneck for a
large ingress data plane. At ~180
Skipper pods per cluster × 200
clusters, each independently polling the API for
Ingress + RouteGroup, etcd was "overwhelmed" and API-
server CPU was throttled — "our clusters lost the ability
to schedule new pods effectively." The failure surface is
the scheduler, not the request path. Canonical wiki
instance of concepts/control-plane-fan-out-to-kubernetes-api.
Remediation: insert Route
Server as a single coalescing proxy between Skipper and
the API (patterns/control-plane-proxy-with-etag-cache)
that polls the API every 3 s and serves Skippers with
HTTP ETag / 304
(concepts/etag-conditional-polling). Zalando explicitly
rejected Kubernetes Informers as the alternative: an
informer-based design keeps the N× fan-out shape at change
events — "Since it's a sudden increase in traffic and HPA
won't be able to catch up and scale Kubernetes API and
etcd." Rolled out via False → Pre (shadow diff of
routing tables) → Exec flag
(patterns/three-mode-rollout-off-shadow-exec), tier by
tier; zero GMV loss; Skipper HPA ceiling raised to 300
pods per cluster.
Go binary-size story¶
Kubernetes is both a victim and a beneficiary of the Go binary-size engineering documented in sources/2026-02-18-datadog-how-we-reduced-agent-go-binaries-up-to-77-percent:
- Victim (transitive imports of Kubernetes). The
Datadog Agent trace-agent accidentally
pulled 526
k8s.io/*packages / ≥30 MiB via one function in one Agent package; Datadog's fix was a package split (#32174) — a surface-area hint that Kubernetes' own dep graph is wide enough to be consequential downstream. - Beneficiary (adopts method-DCE). Datadog's kubernetes/kubernetes
PR #132177
removed a
reflect.MethodByNameoffender (concepts/reflect-methodbyname-linker-pessimism); Kubernetes contributors picked up the same method-DCE enablement and report 16-37 % binary-size reductions across Kubernetes binaries. Canonical downstream beneficiary of the patterns/upstream-the-fix pattern in a Go-toolchain setting.
Separately, systems/containerd (the default K8s container
runtime) was the root cause of the Agent's 245-MiB plugin-
import regression — Datadog's upstream containerd build-tag fix
propagates to every Go program using containerd, directly or
transitively.
Nodeless variants via Virtual Kubelet¶
Not every managed-K8s product uses real Nodes. Fly.io's Fly Kubernetes (FKS) runs K3s as the API plane and plugs in a Virtual Kubelet provider that forwards every Pod-create request into Fly Machines (Firecracker micro-VMs). There is no Node object in the cluster at all — see concepts/nodeless-kubernetes and concepts/micro-vm-as-pod. Fly maps the rest of the K8s primitives to existing Fly.io primitives 1:1 (patterns/primitive-mapping-k8s-to-cloud): containerd/CRI → flyd + Firecracker + Fly init; CNI → internal IPv6 WireGuard mesh; Services → the Fly Proxy; CoreDNS retained at beta. Source: sources/2024-03-07-flyio-fly-kubernetes-does-more-now. Same pattern underlies AWS Fargate-on-EKS and Azure AKS Virtual Nodes; the wiki's canonical K8s + Virtual-Kubelet example is FKS.
Seen in (Kubernetes as Firecracker-µVM substrate for AI-agent execution)¶
- sources/2026-04-24-atlassian-rovo-dev-driven-development — Atlassian Fireworks (2026-04-24) is a canonical wiki instance of K8s as the scheduling / networking / declarative-API substrate under Firecracker micro-VMs — the same architectural move systems/fly-kubernetes makes in public cloud, applied internally for AI-agent workload execution. The engineering composition it enables: accept OCI container images via the K8s-shaped API surface, boot them as hardware-isolated Firecracker VMs instead of shared-kernel containers, and layer in per-µVM eBPF network-policy enforcement, Envoy ingress, Raft persistence for scheduler state, and node agents that manage VM lifecycle (100ms warm starts, live migration, snapshot filesystem restore, sidecar sandboxes). Canonicalised as concepts/hardware-isolated-microvm-on-kubernetes. First wiki disclosure that K8s-at-Atlassian runs on the "shared AWS scms Kubernetes cluster" and that per-developer dev shards on this cluster are the canonical integration-test substrate for AI-agent-written code — the K8s primitive the agentic development workflow rides on.