Skip to content

SYSTEM Cited by 1 source

Netflix Cosmos

Cosmos is Netflix's internal compute + storage platform for media processing — the substrate on which Netflix's encoding pipelines, and other per-asset / per-clip transformations run at scale. Cosmos is canonically documented in Netflix TechBlog's The Netflix Cosmos Platform post (not yet ingested on this wiki).

Within this wiki, Cosmos is first introduced in the context of MPS camera-file processing — Netflix runs FLAPI as Cosmos Stratum Functions that accept an input clip, an output location, and parameters (frame ranges, AMF, FDL) and shut down when the unit of work completes (Source: sources/2026-04-24-netflix-scaling-camera-file-processing-at-netflix).

What Cosmos gives the caller

Per the MPS article, the properties Cosmos exposes to workloads like FLAPI-driven media processing:

  • Serverless Function packaging of tools shipped as Linux Docker images — "quickly invoked to process a single unit of work and shut down on completion". Canonical wiki instance of patterns/serverless-function-for-media-processing.
  • Elastic CPU compute — Netflix targets CPU instances for these workloads to take advantage of the wide encoding pool rather than scarce GPU capacity (concepts/cpu-only-media-processing).
  • Observability + operational reliability for these stateless workers (the MPS post calls out Netflix's "paved-path encoding infrastructure, enabling us to take advantage of proven compute and storage scalability with robust observability").
  • Elastic pool sharing across many types of encoding workloads — allocate on demand, yield back when the work queue dies down.

Stratum Functions

The post names the unit of work as a Cosmos Stratum Function: a parameterised invocation of a Docker-packaged tool against a single clip or sub-segment of a clip. Stratum Functions are what the MPS pipeline dispatches when it needs to inspect OCF, render a VFX plate, generate a deliverable, or debayer a frame range.

The term "Stratum" is the internal name for this FaaS-like layer inside Cosmos. Its wire protocol / scheduler / isolation story is undocumented on this wiki pending ingestion of the Cosmos deep-dive post.

Operational posture from the MPS article

"When tools are API-driven, easily packaged in Linux containers, and don't require a lot of external state management, Netflix can quickly integrate and deploy them with operational reliability."

In practice: Java + Python as primary integration languages; Ubuntu-based Docker images; CPU instances in AWS and in Netflix's local compute centres; GPU instances reserved for workloads that actually need them.

Stub note

This wiki page is a stub — Cosmos's architecture (workflow primitives, artifact tracking, sandbox model, scheduling, distinction from the Titus container platform, evolution from the older Reloaded media pipeline) is out of scope until the Cosmos deep-dive post is ingested.

Seen in

Last updated · 550 distilled / 1,221 read