Data Projects: Managing Data Assets at Netflix Scale¶
Summary¶
Netflix introduced Data Projects — a platform primitive that groups related data assets (tables, workflows, secrets) under a single logical container with a durable, synthetic identity decoupled from any individual human. This solves two persistent problems at Netflix's scale: (1) permissions that break when people change teams or leave, and (2) scheduled workloads tied to human identities that become fragile over time. A Data Project provides both a management container (coarse-grained access control for hundreds of assets at once) and an assumable application identity that workflow orchestrators like Maestro can use to execute jobs durably.
Key takeaways¶
-
Individual-asset ACLs don't scale with organizational fluidity. Netflix had millions of data warehouse tables, each with its own ACL. Org restructures caused mass permission-update storms — a significant share of support threads were just requests to update table permissions en masse (Source: raw file, "Problem 1").
-
Human-tied workload identity is inherently fragile. Scheduled workloads running as a specific engineer's identity break whenever that person changes teams or leaves. Fixing requires "permissions whack-a-mole" — swapping identities surfaces cascading missing grants (Source: raw file, "Problem 2").
-
Data Projects provide a durable synthetic identity. Each project is provisioned with a Netflix application identity (and optionally an AWS IAM role). This identity doesn't change teams, go on vacation, or leave the company. The IAM role can be exchanged for the Netflix identity in a cryptographically secure way (Source: raw file, "The Identity Umbrella").
-
Gravity: automatic asset association. When a workload running under a project's identity creates a new asset (e.g., Maestro workflow creates tables), those assets are automatically added to the project. The project becomes the "center of mass" for everything produced under its identity — organization for free as a side effect of execution (Source: raw file, "Gravity").
-
Maestro as Trusted Workload Manager (TWM). Maestro is formally authorized to mint fresh identity tokens on behalf of the workloads it manages. A single workflow execution may be checked against table ACLs, Netflix authorization policies, and AWS IAM policies — all in one run (Source: raw file, "Securing Data Workflows").
-
Coarse-grained role-based grants replace per-asset ACLs. Projects use roles (Contributor = read/write, Viewer = read-only) assigned to users, groups, applications, and CI jobs. Instead of rewriting hundreds of ACLs on reorg, you update a single project grant (Source: raw file, "Grants and Roles").
-
Rightsizing and permission-creep prevention. Netflix is investing in an authorization-policy layer that automatically rightsizes permissions based on actual usage patterns, proactively eliminating unnecessary access (Source: raw file, "What's Next").
-
Projects as a general platform primitive. The concept is being generalized beyond data — future vision includes software assets (GitHub repos, Spinnaker apps, Docker images) and studio assets under the same project abstraction (Source: raw file, "What's Next").
Operational numbers¶
- Millions of tables in the data warehouse
- Tens of thousands of scheduled workloads
- Some Data Projects contain tens of thousands of assets in production
- Tens of thousands of scheduled workloads managed by Maestro (designated TWM)
Architectural elements extracted¶
Systems¶
- Netflix Data Projects — the primary system described
- Netflix Maestro — workflow orchestrator, designated Trusted Workload Manager
- Amazon EMR — runs Spark jobs under project IAM roles
Concepts¶
- Durable synthetic identity — non-human identity that outlives individual engineers
- Asset gravity — automatic association of created assets to the identity that produced them
- Project-level access control — coarse-grained container replacing per-asset ACLs
- Identity umbrella — combined Netflix app identity + AWS IAM role per project
- Permission creep — gradual accumulation of unnecessary access grants over time
Patterns¶
- Project-scoped identity for workloads — replace user OBO tokens with durable project identity
- Automatic asset association by identity — gravity: assets inherit their creator's project
- Coarse-grained access container — manage permissions on one project rather than N assets
- Identity exchange across identity domains — cryptographic exchange between Netflix identity and AWS IAM
Caveats¶
- The article focuses on the data platform; generalization to software/studio assets is future work.
- Migration of existing workflows to project identity requires setting up correct permissions beforehand — tooling for automated permission recommendations is still being built.
- No quantitative latency/throughput numbers for the identity-exchange or ACL-check paths.
Source¶
- Original: https://medium.com/netflix-techblog/data-projects-managing-data-assets-at-netflix-scale-7ca25888591e?source=rss----2615bd06b42e---4
- Raw markdown:
raw/netflix/2026-06-19-data-projects-managing-data-assets-at-netflix-scale-118a2da4.md
Related¶
- systems/netflix-data-projects
- systems/netflix-maestro
- concepts/workload-identity
- concepts/durable-synthetic-identity
- concepts/asset-gravity
- concepts/project-level-access-control
- concepts/identity-umbrella
- concepts/permission-creep
- patterns/project-scoped-identity-for-workloads
- patterns/automatic-asset-association-by-identity
- patterns/coarse-grained-access-container
- patterns/identity-exchange-across-identity-domains
- companies/netflix