Skip to content

SYSTEM Cited by 1 source

Netflix FPD + CEA (Cloud Efficiency data platform)

FPD (Foundational Platform Data) + CEA (Cloud Efficiency Analytics) is the two-layer internal data platform that the Netflix Platform DSE (Data Science Engineering) team uses to attribute cost-and-ownership across Netflix's AWS footprint to specific teams, services, and organisations. FPD is the normalised substrate (inventory + ownership + usage); CEA is the business-logic layer that turns that substrate into attributed-cost time-series consumable by engineering organisations.

Architecture

Two distinct layers with a deliberate separation of concerns:

  • FPD — Foundational Platform Data. Ingests from each Netflix platform (e.g. Spark) three normalised streams:
  • Inventory — what resources exist.
  • Ownership — which team / user / org owns them.
  • Usage — how those resources were exercised over time.

FPD establishes data contracts with each platform's owners to guarantee data quality and reliability, and transforms heterogeneous platform emissions into a consistent data model for ownership. The standardised model is what makes the downstream analytics layer scalable. - CEA — Cloud Efficiency Analytics. Consumes FPD, applies per-platform business logic ("cost heuristics are unique to each platform"), and produces time-series efficiency metrics at multiple aggregation granularities. CEA is described as "compartmentalized and transparent": downstream consumers can trace why a given dollar shows up under their org and how it was calculated.

Design principles

From the post:

  • Accuracy, reliability, accessibility. The team's stated tenants — efficiency data is only useful if it's trusted.
  • Documented. "Comprehensive documentation to navigate the complexity of the efficiency space" — Netflix treats documentation as a first-class deliverable because the underlying model (owners, cost heuristics, multi-tenancy) is inherently complex.
  • SLAs published. "Well-defined Service Level Agreements (SLAs) to set expectations with downstream consumers during delays, outages or changes" — cost data is treated as a production data product, not a spreadsheet.
  • Single-owner resolution + multi-tenant distribution. "For cost accounting purposes, we resolve assets to a single owner, or distribute costs when assets are multi-tenant."
  • Multi-aggregation output. "We do also provide usage and cost at different aggregations for different consumers." — same substrate, multiple consumer-shaped views.

Three named program tensions

(from the post)

  1. "A Few Sizes to Fit the Majority" — every platform has per-platform customisation that doesn't fit one data-model mould. Netflix's answer is ongoing explicit negotiation with producers + consumers rather than a single rigid schema.
  2. "Data Guarantees" — audits + per-layer health visibility are load-bearing for trust; "maintaining data completeness while ensuring correctness becomes challenging due to upstream latency and required transformations."
  3. "Abstraction Layers" — when an internal platform team builds a SaaS on top of another internal platform, cost attribution has to chase the abstraction chain. FPD's clean inventory/ownership/usage separation is the insulator that lets CEA produce sensible numbers regardless of whether a given user builds on AWS directly or on a Netflix-internal SaaS layered above it.

Forward direction

"Longer term, we plan to extend FPD to other areas of the business such as security and availability." + "We aim to move towards proactive approaches via predictive analytics and ML for optimizing usage and detecting anomalies in cost."

In other words: (1) generalise FPD's substrate discipline beyond cost into other cross-platform metrics; (2) move CEA from descriptive to prescriptive — anomaly detection and optimisation recommendations rather than just dashboards.

  • patterns/chargeback-cost-attribution — FPD/CEA is the pre-chargeback substrate: the attributed-cost time-series that a Netflix chargeback mechanism would consume, rather than the chargeback mechanism itself.
  • concepts/capacity-efficiency — the Meta framing of the same problem space. Netflix's post focuses on upstream data correctness and transparent attribution as the substrate for capacity-efficiency work; Meta's focuses on the offense/defense/AI-agent optimisation loop that sits above such a substrate.

Seen in

Last updated · 319 distilled / 1,201 read