Skip to content

CONCEPT Cited by 1 source

CPU-only media processing

Definition

CPU-only media processing is the deliberate choice to run image / video / audio processing workloads on CPU instances in the cloud, rather than on GPU instances — even when the underlying engine would run faster per-frame on a GPU. The optimisation target is fleet throughput and cost/performance, not single-instance speed.

The shape: tool supports both CPU and GPU rendering; the operator picks CPU; rendering is parallelised across many small CPU workers rather than concentrated on a few big GPU workers.

Canonical instance — Netflix MPS / FLAPI

Netflix's Media Production Suite runs FLAPI inside Cosmos on CPU-only instances for inspection, rendering, and trimming jobs — even though "FLAPI also supports GPU rendering." The explicit rationale (Source: sources/2026-04-24-netflix-scaling-camera-file-processing-at-netflix):

"CPU instances give us access to a much wider segment of Netflix's vast encoding compute pool and free up GPU instances for other workloads."

Two wins in one decision:

  1. Pool size: CPU capacity inside Netflix's AWS footprint is many-times larger than GPU capacity. CPU-only workloads can scale faster because the pool is bigger.
  2. Workload isolation: GPU instances are preserved for workloads that truly need them (ML training / inference), avoiding contention.

Why the traditional model went the other way

Traditional on-prem film-processing facilities invested in *"beefy computers with large GPUs and high-performance storage arrays to rip through debayering and encoding at breakneck speed." That model's constraint was fixed-hardware investment: once you own the GPU box, you want to saturate it.

Cloud flips the constraint: you don't own the hardware, you're paying per-instance-hour, and the cheaper-and-wider CPU pool dominates economically when the work can be parallelised across many small instances. Netflix's framing:

"Operating within these constraints lets us focus on increasing throughput via parallel encoding rather than focusing on single-instance processing power. We can then target the sweet spot of the cost/performance efficiency curve while still hitting our target turnaround times."

Enabling preconditions

CPU-only media processing works when:

  • The tool supports CPU rendering at acceptable fidelity (FLAPI does; so do most mainstream encoders).
  • Work can be parallelised per clip or per sub-segment (Netflix dispatches one Stratum Function per clip or sub-segment — see patterns/serverless-function-for-media-processing).
  • Turnaround is measured in fleet throughput, not single-job latency — which is the case when thousands of parallel renders swarm a VFX turnover and then yield back.
  • The CPU pool is larger / cheaper / less contested than the GPU pool — mostly true in large cloud footprints, where GPU capacity is a premium tier.

Contrast with GPU-targeted workloads

CPU-only isn't a universal ideal — it's a placement decision per workload. On the same Netflix infrastructure:

The discipline is: choose the placement that gives the widest available pool compatible with the workload's fidelity + SLA requirements, not the placement with the highest single-instance peak speed.

Seen in

Last updated · 550 distilled / 1,221 read