PATTERN Cited by 1 source
Developer Portal as ML Pipeline Control Plane¶
Intent¶
Build the ML-pipeline observability surface (pipeline execution state, per-run metric evolution, model cards) as a plugin inside the organisation's existing internal developer portal, rather than as a standalone ML-only UI. Exposes ML-domain primitives while reusing the shared portal's auth, theming, search, and service catalog — and keeps ML tooling in the same surface engineers already use for every other engineering task.
Context¶
Works when:
- The org already has a developer portal (Spotify Backstage is the canonical substrate) in wide internal use.
- There is a shared vocabulary for components / services that ML pipelines can slot into as a new entity type.
- The ML platform team would otherwise need to build authentication, RBAC, UI shell, theming, and search from scratch for a bespoke ML console — a non-trivial distraction from ML-specific work.
Solution¶
Add ML-domain plugins to the existing developer portal:
- Pipeline execution plugin — real-time view of running pipelines and their per-step state. Underlying data source is the pipeline orchestrator's API (e.g. Step Functions for Zalando).
- Metric evolution plugin — per-pipeline graph of how evaluation metrics (PR-AUC, ROC, accuracy, custom business metrics) change across successive training runs. Lets ML authors diff runs visually.
- Model cards plugin — per-model card documenting training run lineage, evaluation metrics, dataset pointers, and the pipeline that produced the model.
- Existing portal features reused — auth, RBAC, software catalog, TechDocs, search, theming, notifications.
Canonical instance¶
Zalando ML Portal (2022) — Zalando ML Platform built ML-pipeline tracking as a plugin section of their internal developer portal running on systems/backstage. The portal exposes pipeline execution, metric evolution graphs across runs, and model cards. Verbatim from sources/2022-04-18-zalando-zalandos-machine-learning-platform:
"Pipeline tracking is a part of the internal Zalando developer portal running on top of Backstage, an open-source platform for building such portals."
"This ML web interface provides a detailed, real-time view of pipeline execution. Pipeline authors can monitor how metrics evolve across multiple runs of training pipelines and can view these changes on a graph. They can also view model cards for models created by the pipelines."
Consequences¶
Pros:
- Reuse of portal infrastructure. Auth, RBAC, theming, search, software catalog all come for free. Canonical application of concepts/reuse-existing-infrastructure-over-purpose-built-service.
- Consistent engineer experience. ML pipelines live in the same surface as services, runbooks, and docs — ML engineers don't context-switch between tools.
- Model cards + metric graphs add ML-domain primitives the cloud consoles lack. The AWS Console shows Step Functions execution state + SageMaker training-job metrics, but not cross-run metric plotting or model-card-style documentation.
- Low marginal cost per new ML plugin. Once the portal is set up, each new ML-domain view is a Backstage plugin, not a new web app.
Cons:
- Presumes Backstage (or equivalent) is already in place. Orgs without an existing developer portal pay a much higher upfront cost.
- Plugin boundaries can become blurry. ML pipelines share a portal with unrelated tools; navigation / search quality becomes a shared concern.
- Plugin API changes couple the ML portal to Backstage releases — not a unique cost but a real one.
- Lacks battle-tested "ML-first" UX the way standalone tools (Weights & Biases, MLflow, Kubeflow UI) do. The Backstage pattern accepts that trade-off for reuse.
Comparison to standalone ML UIs¶
- MLflow UI — ML-first, standalone. Strong model registry + experiment tracking; weak on cross-team shared context.
- Weights & Biases — hosted ML-first. Rich, but locked outside the org's own portal; another login.
- Kubeflow Pipelines UI — ML-first + K8s-native; requires a separate ingress and identity story.
- Backstage-based ML plugin — ML-weaker, reuse-stronger; fits the central-platform shape particularly well because the platform team already owns the portal.
Related¶
- systems/backstage · systems/zalando-ml-portal-backstage
- systems/zflow — the pipeline authoring tool whose outputs the portal observes.
- systems/aws-step-functions — the orchestrator underneath.
- concepts/reuse-existing-infrastructure-over-purpose-built-service — the architectural design principle this pattern embodies.
- patterns/ml-platform-internal-consulting-team — the org shape that typically produces this kind of portal.