Skip to content

SYSTEM Cited by 1 source

Zalando ML Portal (Backstage)

Definition

The Zalando ML Portal is the observability surface for systems/zflow-authored ML pipelines — a section of Zalando's internal developer portal that runs on top of systems/backstage. It overlays pipeline execution state, per-run metrics, and model cards on top of the underlying Step Functions state machines that zflow compiles to.

Canonical disclosure

From the 2022-04-18 ML Platform overview (sources/2022-04-18-zalando-zalandos-machine-learning-platform):

"Pipeline tracking is a part of the internal Zalando developer portal running on top of Backstage, an open-source platform for building such portals."

"This ML web interface provides a detailed, real-time view of pipeline execution. Pipeline authors can monitor how metrics evolve across multiple runs of training pipelines and can view these changes on a graph. They can also view model cards for models created by the pipelines. These are just a few features of the ML portal, and the tool is actively developed to improve the process of experimenting with notebooks and deploying the pipelines in production."

Named capabilities

  1. Real-time pipeline execution view — see running pipelines and their per-step state live.
  2. Per-run metric evolution graphs — how metrics (PR-AUC, ROC, custom business metrics, etc.) change across successive training runs of the same pipeline. The graph is the canonical cross-run-diff surface.
  3. Model cards — per-model card documenting the model's training run, evaluation metrics, and the pipeline that created it. (Specific schema not disclosed.)
  4. "Actively developed to improve the process of experimenting with notebooks and deploying the pipelines in production" — named as a living product, not a frozen artefact.

Why Backstage (and not the AWS Console)

The AWS Console for Step Functions + SageMaker shows state-machine executions and SageMaker training-job metrics, but it lacks cross-run metric plotting and it lacks ML-domain primitives like model cards. Backstage lets Zalando add those specialised views as plugins while keeping every other engineering tool in the same portal.

Wiki positioning

  • Canonical instance of the developer- portal-as-ML-control-plane pattern on the wiki.
  • The ML Portal is the second internal-tool surface that Zalando ML Platform operates (the first being systems/zflow itself). The post names "two teams actively develop zflow and monitoring tools for pipelines" — monitoring tools ≈ the ML portal.
  • Internals (Backstage plugin structure, metric storage, model-card schema) are not disclosed. Stub page — expand when Zalando publishes more.

Seen in

Last updated · 501 distilled / 1,218 read