Skip to content

PATTERN Cited by 1 source

Developer Portal as ML Pipeline Control Plane

Intent

Build the ML-pipeline observability surface (pipeline execution state, per-run metric evolution, model cards) as a plugin inside the organisation's existing internal developer portal, rather than as a standalone ML-only UI. Exposes ML-domain primitives while reusing the shared portal's auth, theming, search, and service catalog — and keeps ML tooling in the same surface engineers already use for every other engineering task.

Context

Works when:

  • The org already has a developer portal (Spotify Backstage is the canonical substrate) in wide internal use.
  • There is a shared vocabulary for components / services that ML pipelines can slot into as a new entity type.
  • The ML platform team would otherwise need to build authentication, RBAC, UI shell, theming, and search from scratch for a bespoke ML console — a non-trivial distraction from ML-specific work.

Solution

Add ML-domain plugins to the existing developer portal:

  1. Pipeline execution plugin — real-time view of running pipelines and their per-step state. Underlying data source is the pipeline orchestrator's API (e.g. Step Functions for Zalando).
  2. Metric evolution plugin — per-pipeline graph of how evaluation metrics (PR-AUC, ROC, accuracy, custom business metrics) change across successive training runs. Lets ML authors diff runs visually.
  3. Model cards plugin — per-model card documenting training run lineage, evaluation metrics, dataset pointers, and the pipeline that produced the model.
  4. Existing portal features reused — auth, RBAC, software catalog, TechDocs, search, theming, notifications.

Canonical instance

Zalando ML Portal (2022) — Zalando ML Platform built ML-pipeline tracking as a plugin section of their internal developer portal running on systems/backstage. The portal exposes pipeline execution, metric evolution graphs across runs, and model cards. Verbatim from sources/2022-04-18-zalando-zalandos-machine-learning-platform:

"Pipeline tracking is a part of the internal Zalando developer portal running on top of Backstage, an open-source platform for building such portals."

"This ML web interface provides a detailed, real-time view of pipeline execution. Pipeline authors can monitor how metrics evolve across multiple runs of training pipelines and can view these changes on a graph. They can also view model cards for models created by the pipelines."

Consequences

Pros:

  • Reuse of portal infrastructure. Auth, RBAC, theming, search, software catalog all come for free. Canonical application of concepts/reuse-existing-infrastructure-over-purpose-built-service.
  • Consistent engineer experience. ML pipelines live in the same surface as services, runbooks, and docs — ML engineers don't context-switch between tools.
  • Model cards + metric graphs add ML-domain primitives the cloud consoles lack. The AWS Console shows Step Functions execution state + SageMaker training-job metrics, but not cross-run metric plotting or model-card-style documentation.
  • Low marginal cost per new ML plugin. Once the portal is set up, each new ML-domain view is a Backstage plugin, not a new web app.

Cons:

  • Presumes Backstage (or equivalent) is already in place. Orgs without an existing developer portal pay a much higher upfront cost.
  • Plugin boundaries can become blurry. ML pipelines share a portal with unrelated tools; navigation / search quality becomes a shared concern.
  • Plugin API changes couple the ML portal to Backstage releases — not a unique cost but a real one.
  • Lacks battle-tested "ML-first" UX the way standalone tools (Weights & Biases, MLflow, Kubeflow UI) do. The Backstage pattern accepts that trade-off for reuse.

Comparison to standalone ML UIs

  • MLflow UI — ML-first, standalone. Strong model registry + experiment tracking; weak on cross-team shared context.
  • Weights & Biases — hosted ML-first. Rich, but locked outside the org's own portal; another login.
  • Kubeflow Pipelines UI — ML-first + K8s-native; requires a separate ingress and identity story.
  • Backstage-based ML plugin — ML-weaker, reuse-stronger; fits the central-platform shape particularly well because the platform team already owns the portal.
Last updated · 501 distilled / 1,218 read