PATTERN Cited by 1 source
Decoupled compute and serving stacks¶
Problem¶
An ML platform runs two fundamentally different workload classes — compute (training, batch inference, hyperparameter optimisation, interactive notebooks) and serving (real-time inference behind latency-sensitive endpoints). Running them on one substrate forces a compromise: either pay for idle capacity to keep compute fast, or suffer serving cold-starts that blow p99 budgets.
Pattern¶
Split the platform into two purpose-built stacks, each optimised for its own workload shape:
- Compute stack on a managed serverless substrate (e.g. SageMaker) — on-demand provisioning, scale-to-zero economics, per-job billing, no idle cost. Cold start is the cost; mitigate with warm pools and lazy image loading for the most latency-sensitive jobs.
- Serving stack on a long-lived container platform (e.g. EKS / Kubernetes) — warm pods, HPAs, PDBs, service meshes, fine-grained ops control, per-team prediction services.
The stacks are fully decoupled at the runtime level. They integrate only through a narrow set of cross-stack seams — see patterns/model-registry-and-object-store-as-hybrid-glue for the glue.
When it's the right shape¶
This pattern is appropriate when:
- The compute workloads are bursty (idle most of the time, with occasional large bursts of training / batch jobs) so that serverless on-demand economics meaningfully beat always-on clusters.
- The serving workloads are sustained and latency-sensitive so that warm K8s primitives are worth the idle capacity cost.
- The team organisation is such that per-team serving services (each with custom prediction handlers) is natural — K8s's per-service primitives fit.
Lyft / LyftLearn 2.0¶
Canonical wiki instance. LyftLearn 2.0 split into LyftLearn Compute on SageMaker and LyftLearn Serving on EKS:
- Compute — SageMaker Manager Service orchestrates training, batch, HPO, JupyterLab notebooks via AWS SDK. EventBridge + SQS carry job-state events (replacing background watchers). On-demand provisioning eliminates idle K8s capacity cost.
- Serving — EKS clusters host dozens of team-owned model-serving services (pricing, fraud, dispatch, ETA, etc.), each with custom prediction handlers. A Model Registry Service coordinates deployments across services.
Integration happens only through artifact flow: S3 for model binaries, Model Registry for lineage, ECR for Docker images that flow to both stacks, and the LyftLearn database for job + model metadata. "Each LyftLearn product operates independently while maintaining seamless end-to-end ML workflows." (Source: sources/2025-11-18-lyft-lyftlearn-evolution-rethinking-ml-platform-architecture)
Trade-offs¶
- + Each stack optimised for its workload; no compromise on idle cost vs. warm-pod latency.
- + Independent scaling, deployment, and operation of the two stacks.
- − Two operational surfaces to maintain instead of one.
- − Model-registry + object-store glue is load-bearing; if that seam breaks, end-to-end workflows stop.
- − Cross-stack networking can surface issues (e.g. cross-cluster Spark for Lyft) that single-stack architectures never hit.
Seen in¶
- sources/2025-11-18-lyft-lyftlearn-evolution-rethinking-ml-platform-architecture — Lyft LyftLearn 2.0; SageMaker (compute) + EKS (serving).