Skip to content

SYSTEM Cited by 1 source

Netflix Batch

Netflix Batch is the successor to Netflix's Compute Managed Batch (CMB) system. It replaces CMB's custom queuing and scheduling logic with Kueue while maintaining the same user-facing API. Netflix Batch manages millions of batch workloads in production.

Architecture

The key difference from the old CMB flow: queuing and scheduling are deferred to Kueue, which runs in each Kueue-enabled Titus cell. Titus federation routes jobs to Kueue cells via a custom Kueue router.

Mapping CMB concepts to Kueue

CMB concept Kueue equivalent
Internal tenant Cohort
Leaf tenant ClusterQueue + LocalQueue
Reserved capacity Resource flavors + nominal quotas
Shared capacity Cohort-level borrowing with reclaimWithinCohort
Priority ordering WorkloadPriority + withinClusterQueue: LowerPriority

Improvements over CMB

  • Preemption — CMB had no preemption; once admitted, jobs ran to completion. Kueue enables preemption-based fair sharing, reclaiming idle reserved capacity and preempting lower-priority work.
  • Better utilization — reserved resources are lent when idle, significantly increasing average resource utilization.
  • Closer to the cluster — CMB was "far removed from the underlying Kubernetes cluster," making features like preemption cumbersome to implement.

Migration Strategy

  1. Maintained API parity — zero lift for end users
  2. Migrated largest/most complex customer first (4-week production rollout)
  3. Load-tested with higher QPS/Burst/groupKindConcurrency in dev environment

(Source: sources/2026-06-22-netflix-how-netflix-simplified-batch-compute-with-kueue)

Seen in

Last updated · 559 distilled / 1,651 read