SYSTEM Cited by 1 source
Netflix Data Projects¶
Data Projects is a Netflix platform primitive that provides two capabilities in one abstraction:
- A container for managing related data assets in aggregate — tables, workflows, secrets, and other data assets grouped under a single logical umbrella with coarse-grained role-based access control.
- A durable, synthetic, assumable identity — a Netflix application identity (and optionally an AWS IAM role) that scheduled and asynchronous workloads execute under, independent of any human's lifecycle.
Motivation¶
At Netflix's scale (millions of tables, tens of thousands of scheduled workloads), two problems persisted:
- Permissions couldn't keep up with organizational change. Each table had its own ACL. Reorgs caused mass permission-update storms or led teams to open access too broadly, defeating ACLs entirely.
- Workloads tied to human identities were fragile. When the owning engineer changed teams or left, workflows broke. Swapping in a colleague's identity kicked off cascading permission failures ("permissions whack-a-mole").
Architecture¶
Grants and roles¶
Each Data Project defines grants for multiple identity types: users, groups, applications, and CI jobs. Roles include:
| Role | Access level |
|---|---|
| Contributor | Read/write to project assets |
| Viewer | Read-only access |
Instead of rewriting hundreds of ACLs when someone joins or leaves, you update a single project grant.
Identity umbrella¶
Every project is provisioned with:
- A Netflix application identity — used by Maestro and other orchestrators to execute async workloads.
- An optional AWS IAM role — for specialized compute like Spark on Amazon EMR. The IAM role can be exchanged for the Netflix identity in a cryptographically secure manner.
Privileged members can assume the project identity for testing/troubleshooting — running commands exactly as the scheduled workload would.
Gravity¶
When a workload running under a project's identity creates a new asset, that asset is automatically added to the project. The project becomes the center of mass for everything produced under its identity — providing organization as a side effect of how the platform already works (concepts/asset-gravity).
Integration with Maestro¶
systems/netflix-maestro is designated a Trusted Workload Manager (TWM) — formally authorized to mint fresh identity tokens on behalf of the workloads it manages. Maestro validates the caller's access to the project before executing any workflow under its identity (Source: sources/2026-06-19-netflix-data-projects-managing-data-assets-at-netflix-scale).
Operational scale¶
- Some Data Projects contain tens of thousands of assets in production
- Maestro manages tens of thousands of scheduled workloads that benefit from project identity
Success stories¶
- Streaming Quality of Experience — core observability pipeline now runs under project identity, stable regardless of team membership changes.
- Member Analytics — business-critical analytics with project-level access management rather than per-table grants.
Future direction¶
- Generalization beyond data — Projects (not just Data Projects) spanning software assets (repos, Spinnaker apps, Docker images) and studio assets.
- Automatic permission rightsizing — proactive elimination of unnecessary access based on actual usage patterns.
- Activity logs and audits — unified timeline of grant and asset changes at the project level.