PATTERN Cited by 1 source
Cross-platform base image¶
Problem¶
A workload needs to run in multiple execution contexts — e.g. managed training jobs, interactive notebooks, and K8s serving pods — and you want one Docker image to run correctly across all of them. Shipping three separate images per team triples build, test, and deploy surface; forces training-vs-serving dependency drift; and makes training-to-serving reproducibility hard to guarantee.
Pattern¶
Build one base image that detects its execution environment at runtime and adapts env vars, users, permissions, networking, and framework config accordingly (concepts/runtime-environment-detection). The entrypoint compatibility layer is where the detection happens and the adaptation is applied.
Structure¶
FROM base-runtime
# Core runtime: framework, libraries, OS packages — identical
# across execution contexts.
...
# Entrypoint script:
# 1. Detect context (SageMaker Job? Studio notebook? K8s pod?)
# 2. Fetch credentials / env / hyperparams / config appropriate
# to the context.
# 3. Configure framework (Spark, metrics client, etc.) for the
# context.
# 4. exec user command.
ENTRYPOINT ["/opt/entrypoint.sh"]
CMD []
Variant images per workload class, not per context¶
You may still maintain separate base images per workload class (traditional ML, distributed ML with Spark, GPU deep-learning), because those differ in heavyweight dependencies. But each of those images is itself cross-platform across execution contexts.
Lyft / LyftLearn 2.0¶
Canonical wiki instance. LyftLearn ships three cross-platform base images:
- LyftLearn image — traditional ML workloads.
- LyftLearn Distributed image — adds Spark ecosystem integration (custom wrappers, executor configs, JAR deps).
- LyftLearn DL image — adds GPU + deep-learning libraries.
Each runs correctly in three distinct execution contexts — SageMaker Jobs, SageMaker Studio notebooks, and K8s model-serving — by detecting context at runtime and adapting. The same image trains a model on SageMaker and serves it on K8s; this guarantees model-dependency parity end-to-end (Source: sources/2025-11-18-lyft-lyftlearn-evolution-rethinking-ml-platform-architecture).
The Spark-compatible image was the hardest: it had to preserve compatibility with Lyft's existing Spark infrastructure (custom wrappers, executor configs, JARs) while running correctly in three contexts.
Trade-offs¶
- + Single source of truth for the runtime; eliminates training-vs-serving drift.
- + One test surface — image correctness validated once across all contexts.
- − Entrypoint complexity grows with every context the image supports.
- − Image size can bloat if contexts need different deps; tension with concepts/lazy-container-image-loading / SOCI / image-size optimisation.
Seen in¶
- sources/2025-11-18-lyft-lyftlearn-evolution-rethinking-ml-platform-architecture — canonical wiki instance; three LyftLearn base images across three execution contexts.