CONCEPT Cited by 1 source

Foundational ML platform¶

A foundational ML platform is a deliberately minimal, reusable core that bolts ML tooling onto a company's production data / compute / orchestration / serving substrate, and then lets each application team build domain-specific libraries on top. The platform provides the human-friendly API, the integrations to production systems, and the path from prototype to production — but it does not try to encode every team's workflow.

Why "foundational", not "full-stack"¶

Netflix's MLP team frames the choice this way:

"Given the very diverse set of ML and AI use cases we support — today we have hundreds of Metaflow projects deployed internally — we don't expect all projects to follow the same path from prototype to production. Instead, we provide a robust foundational layer with integrations to our company-wide data, compute, and orchestration platform, as well as various paths to deploy applications to production smoothly. On top of this, teams have built their own domain-specific libraries to support their specific use cases and needs." (Source: sources/2024-07-22-netflix-supporting-diverse-ml-systems-at-netflix)

A full-stack, one-size-fits-all platform fails against real use- case diversity; the "outliers outside the systems maintained by our engineering teams" incur "unsustainable operational overhead." A foundational platform avoids that trap by being narrow enough to be universally adopted, and extensible enough for teams to build on.

What belongs in the foundation¶

From the canonical Metaflow instance:

API ergonomics — a human-friendly way to declare flows, steps, parameters, dependencies.
Integrations to prod — data warehouse, compute, orchestrator, deployment substrates wired in via a published extension mechanism (concepts/metaflow-extension-mechanism).
Dependency management — reproducible execution environments (@conda, @pypi, portable envs; see concepts/portable-execution-environment).
Canonical deployment paths — e.g. precomputed-cache-backed API (patterns/precompute-then-api-serve) and real-time decorator-driven REST (systems/netflix-metaflow-hosting).

What stays out¶

Team-specific domain logic and configuration-management tools.
Feature-specific glue (e.g. explainer flows, KYC orchestration, ranking-specific retrievers).

Seen in¶

sources/2024-07-22-netflix-supporting-diverse-ml-systems-at-netflix — Netflix Metaflow is the canonical industrial instance.

Foundational ML platform¶

Why "foundational", not "full-stack"¶

What belongs in the foundation¶

What stays out¶

Seen in¶

Related¶