Skip to content

CONCEPT Cited by 1 source

Hybrid ML platform architecture

Hybrid ML platform architecture is the shape in which a single ML platform splits its compute tier from its serving tier onto different substrates chosen to match each workload's access pattern: training / batch / HPO / notebook compute runs on a managed-serverless substrate (e.g. SageMaker) that can scale from zero and bill per execution, while real-time inference runs on a latency-sensitive, long-lived container platform (e.g. EKS / Kubernetes) where warm pods and fine-grained ops control matter more than on-demand elasticity.

Why the split is worth the seams

The two halves of an ML platform have fundamentally different access patterns:

  • Compute (training, batch, HPO, notebooks) โ€” bursty, often idle, each job can be sized independently; cold-start is annoying but tolerable. Managed serverless collapses idle cost to near zero at the price of provisioning latency.
  • Serving (real-time inference) โ€” sustained traffic, strict p99 latency budgets, stateful / long-lived processes, per-team customisation of prediction handlers. K8s primitives (pods, HPA, PDBs, service meshes) fit naturally; there's no idle-cost story to unlock because the pods are always warm.

Running both on one substrate forces a compromise: either pay for idle K8s nodes to keep compute fast (the LyftLearn 1.0 story), or eat cold-start on serving (unacceptable for pricing / fraud / ETA endpoints). Splitting lets each half optimise for its own distribution.

Integration surface

Hybrid stacks stay decoupled via a narrow, artifact-only integration surface:

  • A model registry tracks artifacts produced by compute-side training jobs.
  • Object storage (S3) is the substrate for the binaries themselves.
  • A container registry (ECR) feeds images to both stacks.
  • An event bus (EventBridge + SQS) carries job state between the stacks so the compute side doesn't need to poll or be polled.

See patterns/model-registry-and-object-store-as-hybrid-glue and patterns/decoupled-compute-and-serving-stacks for the implementation patterns.

Lyft / LyftLearn 2.0 as canonical case study

LyftLearn 2.0 is the wiki's canonical instance: LyftLearn Compute on SageMaker (training / batch / HPO / JupyterLab), and LyftLearn Serving on EKS (dozens of team-owned model-serving services for pricing / fraud / dispatch / ETA). The stacks are fully decoupled and integrate only through the Model Registry + S3 + ECR + EventBridge/SQS. The migration was accomplished under a zero-code-change constraint on user ML code (Source: sources/2025-11-18-lyft-lyftlearn-evolution-rethinking-ml-platform-architecture).

Seen in

Last updated ยท 517 distilled / 1,221 read