Skip to content

NETFLIX Tier 1

Read original ↗

Netflix — Scaling Camera File Processing at Netflix

Summary

Netflix TechBlog (Eric Reinecke + Bhanu Srikanth, 2026-04-24) documents how FilmLight's FLAPI is integrated as the core studio media-processing engine inside Netflix's Media Production Suite (MPS). The post argues an explicit build-vs-partner posture: Netflix chose to integrate with an industry- standard engine (the same core that powers FilmLight's Baselight / Daylight desktop applications) instead of building a world-class image-processing engine in-house, because doing so would require "deep, continuous collaboration with camera manufacturers and the wider industry." FLAPI is bundled in Ubuntu-based Docker images with Java / Python glue and dispatched as Cosmos Stratum Functions that process one clip or sub-segment per invocation. Netflix runs these functions on CPU-only instances to tap the wider cloud encoding pool, and the same Docker image deploys to both AWS and Netflix's on-prem production compute centres, giving "consistent assessment of footage wherever it may exist." The elastic, stateless, function-based shape lets Netflix "swarm pull requests to get them through quickly, then immediately yield resources back to lower priority workloads," avoiding fixed render-farm capacity and manual queue management during VFX turnover spikes. The FilmLight relationship is explicitly framed as an ongoing technology partnership with roadmap alignment + joint validation + co-evolution on open standards (worked example: ACES 2 support — FilmLight provided a roadmap; Netflix collaborated on integration and fed feedback to the ACES technical leadership).

Key takeaways

  1. Build-vs-partner posture canonicalised at the media- engine tier. Netflix's explicit reasoning:

    "building a world-class image processing engine in-house is a significant, long-term commitment: one that would require deep, continuous collaboration with camera manufacturers and the wider industry … Rather than duplicating that work, we chose to integrate. FilmLight became a trusted technology partner, and FLAPI is now a foundational part of how MPS processes media." Canonical instance of patterns/industry-api-partner-as-media-engine.

  2. Two load-bearing FLAPI roles inside MPS — inspection at ingest and deliverables generation downstream. On inspection: "Use FLAPI to gather camera metadata from the original camera files. Conform the workflow critical fields to Netflix's normalized schema. Make it searchable and reusable for downstream processes." On VFX plates: "Debayer original camera files with the correct format-specific decoding parameters. Crop and de-squeeze images using Framing Decision Lists (ASC FDL) to ensure spatial creative decisions are preserved. Apply ACES Metadata Files (AMF), providing repeatable color pipelines from dailies through finishing. Generate an array of media deliverables in varied formats." See concepts/camera-metadata-normalization.

  3. Four-part cloud-runtime contract for any tool running inside Cosmos. Verbatim:

    "Factors that are essential for tools in our runtime environment include that they: * Are packageable as Serverless Functions in Linux Docker images that can be quickly invoked to run a single unit of work and shut down on completion * Can run on CPU-only instances to allow us to take advantage of a wide array of available compute * Support headless invocation via Java, Python, or CLI * Operate statelessly, so when things do go wrong, we can simply terminate and re-launch the worker" Collectively these define patterns/serverless-function-for-media-processing.

  4. CPU-only chosen over GPU despite FLAPI's GPU capability, because the CPU pool is wider and GPUs are reserved for other workloads. Verbatim:

    "While FLAPI also supports GPU rendering, CPU instances give us access to a much wider segment of Netflix's vast encoding compute pool and free up GPU instances for other workloads." Canonical instance of concepts/cpu-only-media-processing.

  5. Elastic shared-pool scaling for spiky production workloads, canonicalised as patterns/elastic-scaling-for-production-spikes:

    "A quiet day on set may mean minimal new footage to inspect. A full VFX turnover or pulling trimmed OCF for finishing might require thousands of parallel renders in a short time window … This elasticity lets us swarm pull requests to get them through quickly, then immediately yield resources back to lower priority workloads."

  6. Same Docker image deploys to AWS and on-prem, giving uniform processing semantics across Netflix's cloud + production compute centres:

    "since we're able to package FLAPI in a Docker image, we can deploy almost identical code to both cloud and our production compute and storage centers around the world, ensuring a consistent assessment of footage wherever it may exist." The hybrid-cloud shape (AWS + Open Connect backhaul + regional ingest centres) is already canonicalised under concepts/hybrid-cloud-media-ingest.

  7. Partnership as ongoing co-evolution, not vendor- consumer relationship. Surface:

    "FilmLight worked closely with Netflix teams to: * Align on feature roadmaps, particularly around new camera formats and open standards * Validate the accuracy and performance of key operations * Debug edge cases discovered in large-scale, real-world workloads * Evolve the API in ways that serve both Netflix and the wider industry * Create a positive feedback cycle with open standards like ACES and ASC FDL to solve for gaps when the rubber hits the road" Worked example: ACES 2"FilmLight's developers quickly provided a roadmap for support. As our engineering teams collaborated on integration, we also provided feedback to the ACES technical leadership to quickly address integration challenges and test drive updates in our pipeline."

  8. Desktop ↔ backend engine coherence as a pre-production validation gate. Because the FLAPI backend engine and the Baselight workstation engine are the same core:

    "Because we use FilmLight's tools on the backend, our workflow specialists can use Baselight on their workstations to manually validate pipeline decisions for productions before the first day of principal photography." Meaningful pre-flight validation requires this — a separate-reimplementation engine would make desktop results advisory at best.

Systems extracted

  • systems/filmlight-flapinew wiki system. FilmLight's backend-callable API to the Baselight / Daylight image-processing + colour-science engine. Core studio media-processing engine inside Netflix MPS. First canonical wiki reference.
  • systems/filmlight-baselightnew wiki system (stub). Desktop sibling of FLAPI's engine — same core — used by Netflix workflow specialists for manual pre-production validation of pipeline decisions.
  • systems/netflix-cosmosnew wiki system (stub). Netflix's internal compute + storage platform for media processing. Hosts FLAPI-packaged workers as Stratum Functions. First canonical wiki reference.
  • systems/netflix-media-production-suite — expanded with the FLAPI-as-engine framing and the Cosmos Stratum Function dispatch model underneath inspection + VFX plates.
  • systems/netflix-footage-ingest — FLAPI attributed as the engine behind the "inspection" pipeline stage (metadata extraction + workflow-critical-field normalisation).
  • systems/netflix-content-hub — parent portal where ASC-MHL-validated ingests land before FLAPI-driven inspection.
  • systems/netflix-open-connect — production-backbone role carrying media between ingest centres + AWS, already canonicalised; this post reinforces the hybrid- cloud shape.

Concepts extracted

  • concepts/cpu-only-media-processingnew. Deliberate CPU-over-GPU placement choice for media-processing workloads to tap a wider cloud compute pool; canonical instance is Netflix running FLAPI on CPU despite FLAPI's GPU capability.
  • concepts/headless-api-invocationnew. Property that a tool can be driven without a GUI via programmatic SDK / CLI. Cloud-deployment prerequisite for anything running inside Cosmos.
  • concepts/camera-metadata-normalizationnew. Ingest-time normalisation of per-format / per- manufacturer metadata to a single workflow-critical-field schema. Canonical instance at Netflix MPS inspection stage, driven by FLAPI.
  • concepts/stateless-compute — reinforced by FLAPI Cosmos workers' "terminate and re-launch" contract on failure.
  • concepts/elasticity — reinforced by the shared- cloud-pool elastic-scaling story for VFX-turnover spikes.
  • concepts/open-media-standards — reinforced with a second canonical instance (ACES / AMF / ASC FDL / ASC MHL) and the worked ACES 2 collaboration example.
  • concepts/hybrid-cloud-media-ingest — reinforced by the "same Docker image runs in AWS and on-prem" framing.
  • concepts/spiky-traffic — reinforced with the VFX-turnover-as-canonical-spike framing.

Patterns extracted

Operational numbers

The post is architecture-narrative voice and publishes no operational scale numbers — no render counts, no function-invocation rates, no turnaround percentiles, no cost deltas, no pool sizes. The load-bearing quantifiable claims are:

  • "Thousands of parallel renders in a short time window" for a full VFX turnover or finishing pull. (Unitless order-of-magnitude; no duration disclosed.)
  • Cosmos CPU pool is "vast" relative to the GPU pool; no multiplier disclosed.

Prior MPS ingest numbers (already captured on systems/netflix-media-production-suite) remain the scale context: >350 titles across UCAN / EMEA / SEA / LATAM / APAC; ~200 TB / title average / up to ~700 TB outliers.

Caveats / what's not disclosed

  • No FLAPI internals — wire protocol, SDK surface, version, licensing + commercial terms all opaque.
  • No per-stage latency / throughput figures — inspection, debayer, plate generation all presented at architectural altitude only.
  • No Cosmos deep-dive — Stratum Function scheduler, artifact tracking, isolation model, Cosmos-vs-Titus distinction, evolution from Reloaded are all deferred to the separate Cosmos post (not yet ingested).
  • No cost economics — CPU-only choice is framed as a cost/performance sweet spot but no dollar numbers are shared.
  • No failure-mode disclosure — what happens when a Stratum Function crashes mid-render, how partial-output cleanup works, how the orchestrator handles OOM / timeout / quota issues, all opaque.
  • ACES 2 integration status undisclosed — the post narrates the collaboration shape but doesn't say whether production workloads are on ACES 2 or still on ACES 1.
  • No concurrent-render ceilings — how large a swarm Cosmos can absorb for a single VFX turnover is not bounded.
  • On-prem compute centres undocumented — the "local compute and storage centers around the world" are acknowledged but not enumerated; footprint, ISP arrangements, and routing story all deferred.
  • Joint-validation mechanism not described"Validate the accuracy and performance of key operations" is named but not described as a process (which test suites, which thresholds, which cadence).

Source

Last updated · 550 distilled / 1,221 read