CONCEPT Cited by 1 source

Single pipeline train-and-infer¶

Definition¶

Single pipeline train-and-infer is an ML-pipeline architectural pattern where a single compute job owns both model training and batch inference in one pass, instead of splitting them across separate training and inference infrastructure. Applicable when:

The trained model is lightweight enough that hosting a separate inference service (SageMaker endpoint, model server, etc.) is not cost-effective.
Inference is a batch workload, not a real-time request stream.
The retraining cadence matches the inference cadence (or is strictly more frequent).
There is no separate training / inference scaling requirement.

What it saves¶

No checkpointing. Model state doesn't need to be persisted to an external store and re-loaded by an inference service — the same process trains, then scores.
No separate inference infrastructure. No endpoints to provision, monitor, scale, or roll over.
No model-artifact staging pipeline. Training output → inference input is an in-process handoff, not an S3 round-trip.
Operational simplicity. Fewer moving parts = fewer failure modes.

When it's the wrong choice¶

Real-time inference. If you need low-latency single-query scoring against the trained model, a hosted endpoint is mandatory.
Heavy model. GPU-training deep-learning model → hosting it inside every training job is wasteful.
Retraining cadence ≠ scoring cadence. If you retrain weekly but score daily, you need artefact hand-off.

Canonical instance (Zalando ZEOS)¶

ZEOS Demand Forecaster uses a single SageMaker Training Job to both train the LightGBM model (via Nixtla MLForecast) and run batch inference over 5M SKUs × 12 weeks. Rationale:

"Due to the ML model's lightweight training footprint, we bypass complexity, like for example not needing checkpointing, or separate infrastructure for inference. Instead, model training as well as model inference are executed in a single pipeline using AWS SageMaker Training Jobs. This approach reduces complexity, lowers infrastructure costs, and accelerates the pipeline."

LightGBM's lightweight footprint is the load-bearing enabler — a deep-learning model (TFT etc., which Zalando tried and rejected) would not fit this pattern.

Seen in¶

sources/2025-06-29-zalando-building-a-dynamic-inventory-optimisation-system-a-deep-dive

systems/sagemaker-training-job — typical execution host.
systems/lightgbm — typical lightweight model choice.
systems/zeos-demand-forecaster — canonical Zalando consumer.
patterns/single-sagemaker-training-job-train-and-infer — pattern page.
companies/zalando