SYSTEM Cited by 1 source
LyftLearn Serving¶
LyftLearn Serving is the serving half of LyftLearn 2.0 — a distributed real-time model-inference architecture running on EKS / Kubernetes. Dozens of Lyft ML teams each deploy their own model-serving services on this substrate, each containing their team's models with custom prediction handlers and configurations. Use cases include pricing, fraud, dispatch, ETA, and other business-critical production predictions.
A Model Registry Service coordinates model deployments across these per-team serving services, and is the primary integration point with LyftLearn Compute (SageMaker): training jobs on the compute side write model binaries to S3, the Model Registry tracks artifact lineage, and serving services pull from S3 for deployment.
Why it stayed on Kubernetes¶
In LyftLearn 2.0, the compute half moved to SageMaker (serverless, on-demand, paid-per-use) but serving stayed on K8s. Real-time serving is latency-sensitive, long-lived, and already well-solved by the K8s stack — the cost/benefit of migrating serving to a managed platform was not there, and the team-owned per-service deployment model fits K8s's primitives naturally. See hybrid ML platform architecture for why "compute to serverless, serving to K8s" is a recurring shape.
Cross-reference¶
This page is the LyftLearn-2.0 namesake for the serving stack; the architecture is covered in depth in Lyft's 2023 post "Powering Millions of Real-Time Decisions with LyftLearn Serving" — referenced inline by the 2025-11-18 evolution post but not re-derived there.
Seen in¶
- sources/2025-11-18-lyft-lyftlearn-evolution-rethinking-ml-platform-architecture — cross-links to LyftLearn Serving as the EKS half of the 2.0 hybrid architecture.