Skip to content

SYSTEM Cited by 1 source

Kueue

Kueue is a cloud-native job queueing system for batch workloads on Kubernetes, developed under the Kubernetes SIG Scheduling. Unlike YuniKorn or Volcano, Kueue does not replace the kube-scheduler — it manages admission and quota while delegating pod placement to the existing scheduler.

Core Abstractions

  • ClusterQueue — cluster-scoped quota pool with resource flavors and nominal quotas. Maps to a "leaf tenant" in Netflix's hierarchy.
  • LocalQueue — namespace-scoped queue that references a ClusterQueue. Jobs are submitted to LocalQueues.
  • Cohort — a group of ClusterQueues that can borrow and lend resources among themselves. Maps to an "internal tenant" in Netflix's hierarchy.
  • Resource Flavor — represents a type of resource (e.g., GPU type, instance family) for heterogeneous hardware quota management.

Key Capabilities

  • Multi-tenant quota management over heterogeneous hardware
  • Preemption-based fair sharing — idle reserved resources are lent to other queues and reclaimed on demand (reclaimWithinCohort: Any)
  • Priority-based preemption within a queue (withinClusterQueue: LowerPriority)
  • All-or-nothing (gang) scheduling — ensures all pods for a job are admitted together
  • Topology-aware scheduling — respects hardware topology constraints
  • Operates on native primitives: v1.Pod, batch/v1.Job, and higher-level abstractions like RayJob/RayCluster

At Netflix

Netflix adopted Kueue to replace the custom queuing and scheduling logic in their Compute Managed Batch (CMB) system. The migration (dubbed "Netflix Batch") required running Kueue with much higher QPS, Burst, and groupKindConcurrency than defaults to meet throughput requirements. Kueue now manages millions of batch workloads in production. See systems/netflix-batch for the full integration architecture.

(Source: sources/2026-06-22-netflix-how-netflix-simplified-batch-compute-with-kueue)

Seen in

Last updated · 559 distilled / 1,651 read