SYSTEM Cited by 1 source
Netflix Batch¶
Netflix Batch is the successor to Netflix's Compute Managed Batch (CMB) system. It replaces CMB's custom queuing and scheduling logic with Kueue while maintaining the same user-facing API. Netflix Batch manages millions of batch workloads in production.
Architecture¶
The key difference from the old CMB flow: queuing and scheduling are deferred to Kueue, which runs in each Kueue-enabled Titus cell. Titus federation routes jobs to Kueue cells via a custom Kueue router.
Mapping CMB concepts to Kueue¶
| CMB concept | Kueue equivalent |
|---|---|
| Internal tenant | Cohort |
| Leaf tenant | ClusterQueue + LocalQueue |
| Reserved capacity | Resource flavors + nominal quotas |
| Shared capacity | Cohort-level borrowing with reclaimWithinCohort |
| Priority ordering | WorkloadPriority + withinClusterQueue: LowerPriority |
Improvements over CMB¶
- Preemption — CMB had no preemption; once admitted, jobs ran to completion. Kueue enables preemption-based fair sharing, reclaiming idle reserved capacity and preempting lower-priority work.
- Better utilization — reserved resources are lent when idle, significantly increasing average resource utilization.
- Closer to the cluster — CMB was "far removed from the underlying Kubernetes cluster," making features like preemption cumbersome to implement.
Migration Strategy¶
- Maintained API parity — zero lift for end users
- Migrated largest/most complex customer first (4-week production rollout)
- Load-tested with higher QPS/Burst/groupKindConcurrency in dev environment
(Source: sources/2026-06-22-netflix-how-netflix-simplified-batch-compute-with-kueue)
Seen in¶
- sources/2026-06-22-netflix-how-netflix-simplified-batch-compute-with-kueue — full architecture, migration strategy, and preemption config.