Skip to content

NETFLIX 2026-06-22

Read original ↗

How Netflix Simplified Batch Compute with Kueue

Summary

Netflix replaced the custom queuing and scheduling logic in their homegrown managed batch solution, Compute Managed Batch (CMB), with Kueue — a cloud-native Kubernetes job queueing system. CMB had been built in 2018 before mature open-source batch offerings existed. As the Kubernetes ecosystem matured, features CMB provided (or aspired to) — fair sharing, hierarchical tenants, capacity management, priority queuing, preemption — became available in open-source projects. The team chose Kueue over YuniKorn and Volcano because Kueue does not replace the kube-scheduler, integrates with existing Titus scheduling profiles, and supports multi-tenant quota management over heterogeneous hardware. The migration (dubbed "Netflix Batch") handled millions of production batch workloads with zero user-facing changes, completing the production rollout in only 4 weeks.

Key Takeaways

  1. CMB existed since 2018 — Netflix built a custom managed batch solution before Kubernetes-native alternatives matured. It handled workload submission, priority queuing, and capacity management atop Titus (Source: raw file, "Brief Overview of CMB and Titus" section).

  2. Tenant hierarchy is the core abstraction — CMB uses internal tenants (tree organizers, no queues) and leaf tenants (accept work, have queues). Capacity is configured per-tenant with weight-based fair sharing across the tree (Source: raw file, "CMB Tenant Hierarchy" section).

  3. Two capacity types: reserved and shared — Reserved capacity partitions resources exclusively; shared capacity is a global pool that any tenant can burst into. Under CMB, fair sharing only applied at admission (no preemption post-admission) (Source: raw file, "Reserved Capacity" / "Shared Capacity" sections).

  4. Kueue chosen over YuniKorn and Volcano for key architectural reasons: (a) doesn't replace pod scheduling by kube-scheduler, preserving Titus scheduling profiles; (b) supports multi-tenant quota over heterogeneous hardware; (c) operates on native primitives (v1.Pod, batch/v1.Job) and higher-level abstractions (RayJob); (d) native preemption and all-or-nothing scheduling (Source: raw file, "Why Kueue?" section).

  5. Transparent migration with zero user lift — The migration maintained API parity with CMB's existing interface. Under the hood, internal tenants map to Kueue Cohorts and leaf tenants to ClusterQueue + LocalQueue. Capacity configuration converts to resource flavors and nominal quotas (Source: raw file, "Migrating to Kueue" section).

  6. Migrate the hardest customer first — Netflix deliberately enrolled their largest and most complex customer first, building confidence early and reducing the production migration to only 4 weeks (Source: raw file, "Lessons Learned" point 2).

  7. QPS/burst tuning required — Kueue's default QPS, Burst, and groupKindConcurrency settings were insufficient for Netflix's throughput. This was derisked via load tests in a dev environment mimicking Titus (Source: raw file, "Lessons Learned" point 3).

  8. Preemption-based fair sharing unlocks better utilization — With Kueue, reserved resources can be lent to other tenants when idle (reclaimWithinCohort: Any) and reclaimed via preemption. Lower-priority workloads get preempted for higher-priority ones (withinClusterQueue: LowerPriority). This produced a significant increase in average resource utilization (Source: raw file, "Fair Sharing and Preemption" section).

  9. Titus federation abstracts cell topology — CMB (and now Netflix Batch) talks to a single Titus endpoint for workload submission and capacity reservation; federation routes to the correct underlying Kubernetes cluster. The new flow uses a custom "Kueue router" in Titus federation (Source: raw file, "Brief Overview of CMB and Titus" + "Netflix Batch User/Application Workload Submission Flow").

  10. Future work: broader enrollment + training infra — Netflix plans to enroll more Titus batch workloads into the managed experience, and internal training teams are using learnings for Kubernetes-native training job scheduling (Source: raw file, "Current State of Kueue at Netflix").

Architectural Decisions

Decision Choice Rationale
Replace CMB internals vs. new API Replace internals, keep API Derisks by unstacking bets; doesn't disrupt customers
Kueue vs. YuniKorn vs. Volcano Kueue Doesn't replace kube-scheduler; multi-tenant heterogeneous quota; native preemption
Migration order Largest/most complex customer first Builds confidence early; compresses overall timeline
Capacity semantics Cohorts (internal) + ClusterQueue/LocalQueue (leaf) Maps 1:1 to existing CMB tenant hierarchy

Operational Numbers

  • Millions of batch workloads managed by Kueue in production
  • Production migration completed in 4 weeks
  • Significant increase in average resource utilization after preemption-based fair sharing deployed

Caveats

  • The post does not disclose exact cluster sizes, QPS numbers, or preemption latency SLAs.
  • Fair-sharing semantics changed from admission-only (CMB) to continuous with preemption (Kueue) — a semantic shift users should understand.
  • The Kueue router in Titus federation is custom Netflix code, not upstream Kueue.

Source

Last updated · 559 distilled / 1,651 read