SYSTEM Cited by 8 sources
Kubernetes¶
Kubernetes is the dominant open-source container orchestrator (CNCF graduated, originally from Google's Borg lineage). Stub page — expand as sources cite specific subsystems.
Relevant subcomponents already on this wiki: - systems/kube-proxy — default L4 service load balancer (iptables / IPVS / eBPF). - systems/coredns — cluster DNS for service name resolution. - Services / EndpointSlices — API objects that expose a set of pods; watched by control planes like Databricks' systems/databricks-endpoint-discovery-service.
Default service networking (for context)¶
- Client resolves
svc-name.namespace.svc.cluster.localvia CoreDNS → returns aClusterIP(virtual IP). - Packet hits the node; kernel rules (configured by kube-proxy) rewrite the dst to one of the pod IPs per basic L4 policy (round-robin etc.).
- Pod replies to the client.
This pattern has known limits for L7 protocols with long-lived connections (gRPC, HTTP/2 streaming) — see concepts/layer-7-load-balancing and patterns/proxyless-service-mesh.
Seen in¶
- sources/2025-10-01-databricks-intelligent-kubernetes-load-balancing — Databricks runs hundreds of stateless gRPC services per cluster and thousands of clusters across multiple regions; their intelligent LB work explicitly addresses the default model's limitations.
- sources/2025-08-06-allthingsdistributed-removing-friction-sagemaker-ai-development — systems/aws-sagemaker-hyperpod training operator (custom K8s operator) improves on default Kubernetes Job fault-recovery semantics for distributed GPU training: restart only the affected resources instead of the whole job (see patterns/partial-restart-fault-recovery); monitors for stalled batches and non-numeric loss; teams codify recovery policies via YAML.
- sources/2024-08-08-figma-migrated-onto-k8s-in-less-than-12-months — Figma migrated their compute platform from ECS to EKS (managed Kubernetes) in under 12 months. Enumerated motivations for choosing Kubernetes over staying on ECS: StatefulSets for stateful workloads like etcd; Helm charts for OSS like systems/temporal; graceful node cordon-and-drain; CNCF auto-scaling (systems/keda, systems/karpenter); service-mesh ecosystem; reduced vendor lock-in; easier hiring. Operational shape: three active clusters per environment (patterns/multi-cluster-active-active-redundancy) proven in a CoreDNS-destruction incident that cost 1/3 of requests instead of full outage.
- sources/2026-03-18-aws-ai-powered-event-response-for-amazon-eks — Kubernetes as the discovery target for AWS DevOps Agent. The agent queries the Kubernetes API as one of two discovery paths (static resource state + OTel-derived runtime relationships) to build a live dependency graph for incident investigation. Graph nodes are standard K8s API objects — Pods / Deployments / Services / ConfigMaps / Ingress / NetworkPolicies — carrying their full label / annotation / resource-spec / health-check / env-var metadata. See concepts/telemetry-based-resource-discovery for how the static graph fuses with runtime telemetry.
- sources/2026-03-23-aws-generali-malaysia-eks-auto-mode — Kubernetes under the EKS Auto Mode (managed-data-plane) variant at Generali Malaysia. Canonical wiki reference for the compound K8s operating discipline on managed-cluster services: stateless-only pods + pods treated as immutable + Helm as standardised deployment mechanism + HPA traffic-driven auto-scaling. Managed-cluster upgrade churn (Bottlerocket AMI replacement weekly) is mediated by Pod Disruption Budgets + Node Disruption Budgets + off-peak maintenance windows (patterns/disruption-budget-guarded-upgrades). Peer-AWS- service integration surface: GuardDuty / Inspector / Network Firewall / Secrets Manager + External Secrets Operator / Managed Grafana.
- sources/2026-04-06-aws-unlock-efficient-model-deployment-simplified-inference-operator-setup-on-amazon-sagemaker-hyperpod
— Kubernetes as the compilation target for LLM-inference
deployment primitives. The 2026-04-06 HyperPod Inference
Operator post names two mechanisms that compile down to
Kubernetes-native primitives: (1) multi-instance-type
fallback expressed as a prioritised list
(
["ml.p4d.24xlarge", "ml.g5.24xlarge", "ml.g5.8xlarge"]) on a CRD, implemented viarequiredDuringSchedulingIgnoredDuringExecutionto restrict the set +preferredDuringSchedulingIgnoredDuringExecutionwith descending weights to order within it — see concepts/instance-type-fallback; (2) the rawnodeAffinitysurface is exposed directly inInferenceEndpointConfigfor custom scheduling (excluding Spot, preferring AZs, custom labels). Two first-class CRDs underinference.sagemaker.aws.amazon.com/v1:InferenceEndpointConfig(bring-your-own-model) andJumpStartModel(managed model catalog). Verification example:kubectl get pods -n hyperpod-inference-system.
Go binary-size story¶
Kubernetes is both a victim and a beneficiary of the Go binary-size engineering documented in sources/2026-02-18-datadog-how-we-reduced-agent-go-binaries-up-to-77-percent:
- Victim (transitive imports of Kubernetes). The
Datadog Agent trace-agent accidentally
pulled 526
k8s.io/*packages / ≥30 MiB via one function in one Agent package; Datadog's fix was a package split (#32174) — a surface-area hint that Kubernetes' own dep graph is wide enough to be consequential downstream. - Beneficiary (adopts method-DCE). Datadog's kubernetes/kubernetes
PR #132177
removed a
reflect.MethodByNameoffender (concepts/reflect-methodbyname-linker-pessimism); Kubernetes contributors picked up the same method-DCE enablement and report 16-37 % binary-size reductions across Kubernetes binaries. Canonical downstream beneficiary of the patterns/upstream-the-fix pattern in a Go-toolchain setting.
Separately, systems/containerd (the default K8s container
runtime) was the root cause of the Agent's 245-MiB plugin-
import regression — Datadog's upstream containerd build-tag fix
propagates to every Go program using containerd, directly or
transitively.
Nodeless variants via Virtual Kubelet¶
Not every managed-K8s product uses real Nodes. Fly.io's Fly Kubernetes (FKS) runs K3s as the API plane and plugs in a Virtual Kubelet provider that forwards every Pod-create request into Fly Machines (Firecracker micro-VMs). There is no Node object in the cluster at all — see concepts/nodeless-kubernetes and concepts/micro-vm-as-pod. Fly maps the rest of the K8s primitives to existing Fly.io primitives 1:1 (patterns/primitive-mapping-k8s-to-cloud): containerd/CRI → flyd + Firecracker + Fly init; CNI → internal IPv6 WireGuard mesh; Services → the Fly Proxy; CoreDNS retained at beta. Source: sources/2024-03-07-flyio-fly-kubernetes-does-more-now. Same pattern underlies AWS Fargate-on-EKS and Azure AKS Virtual Nodes; the wiki's canonical K8s + Virtual-Kubelet example is FKS.