Skip to content

SYSTEM Cited by 1 source

Metaflow Hosting (Netflix)

Metaflow Hosting is Netflix's integrated model-serving service for artifacts and models produced by Metaflow (Source: sources/2024-07-22-netflix-supporting-diverse-ml-systems-at-netflix). It is positioned as the real-time-serving tier for Metaflow, paired with the precomputed-predictions-via-metaflow.Cache pattern as the batch-serving tier (see systems/netflix-metaflow-cache).

What it provides

From the post:

"Metaflow Hosting is specifically geared towards hosting artifacts or models produced in Metaflow. This provides an easy to use interface on top of Netflix's existing microservice infrastructure, allowing data scientists to quickly move their work from experimentation to a production grade web service that can be consumed over a HTTP REST API with minimal overhead."

Enumerated key benefits (verbatim from the post):

  • "Simple decorator syntax to create RESTFull endpoints."
  • "The back-end auto-scales the number of instances used to back your service based on traffic."
  • "The back-end will scale-to-zero if no requests are made to it after a specified amount of time thereby saving cost particularly if your service requires GPUs to effectively produce a response." See concepts/scale-to-zero.
  • "Request logging, alerts, monitoring and tracing hooks to Netflix infrastructure."

Comparison to AWS SageMaker Model Hosting

"Consider the service similar to managed model hosting services like AWS Sagemaker Model Hosting, but tightly integrated with our microservice infrastructure."

The Netflix angle is the microservice-infra tightness — request-logging, alerts, monitoring, and tracing hooks are the same ones Netflix engineers already use elsewhere, so moving a model from experimentation into prod doesn't create a new observability island. See systems/aws-sagemaker-endpoint for the closest AWS analog.

Deployment mode: sync and async

Metaflow Hosting supports both synchronous and asynchronous queries — the asynchronous mode is the mechanism behind Amber's on-demand feature compute pattern: services send async feature-computation requests that Metaflow Hosting queues; "Metaflow Hosting caches the response, so Amber can fetch it after a while." Canonical wiki instance of patterns/async-queue-feature-on-demand.

Disclosure limits

"Although details have evolved a lot, this old talk still gives a good overview of the service" — the post defers to a pre-2024 Netflix conference talk (YouTube link inline in the original) for architectural detail. No fleet/QPS/latency numbers, no scale-to-zero thresholds, no auto-scale time-constants, no cold-start GPU warm-up figures are given here.

Future work named

"Metaflow Hosting models are currently not well integrated into model logging facilities — we plan on working on improving this to make models developed with Metaflow more integrated with the feedback loop critical in training new models. We hope to do this in a pluggable manner that would allow other users to integrate with their own logging systems."

Plus: "more ways Metaflow artifacts and models can be integrated into non-Metaflow environments and applications, e.g. JVM based edge service, so that Python-based data scientists can contribute to non-Python engineering systems easily."

Seen in

Last updated · 319 distilled / 1,201 read