SYSTEM Cited by 1 source
Atlassian ML Studio¶
Definition¶
Atlassian ML Studio is Atlassian's unified enterprise-scale ML development platform that standardizes modular development, centralizes workflow orchestration, and embeds data governance directly into the execution layer. It is the mission-critical backbone for AI systems including Rovo Search and Chat, the Teamwork Graph, and Confluence AI, enabling ~120k monthly workflow runs across 100+ ML teams and serving models to 5M+ monthly active Rovo users.
Architecture¶
Three architectural pillars:
Pillar 1 — Composable, Reusable ML Modules¶
Each module is a self-contained, executable unit (data processing, model training, packaging, evaluation, deployment). Modules are versioned, shareable, and composable into end-to-end pipelines via the module-as-versioned-artifact pattern:
- Every code push produces a module artifact used in workflows
- Artifact tags (
latestAlpha,latest) enable rollback without redeploying infrastructure - Dynamic CI/CD pipelines provide independent automated tests per module
- Teams own module namespaces autonomously; shared libraries form a reusable catalog
- 2,000+ reusable modules with 200k+ monthly iterations
The local dev loop builds Python modules in <30 seconds (down from minutes), with RDEs mirroring production. Repository is reserved for peer review and promotion, not iteration.
Pillar 2 — Workflow Orchestrator¶
Central service that manages, schedules, and automates ML workflows via CLI, portal, and API:
- Cloning/rerun — reproduce or adapt previous runs trivially
- Flexible triggering — portal, CLI, or programmatic API
- Composable workflows — nested and joined sub-workflows
- Hot clusters — pre-provisioned clusters stay active between runs (patterns/hot-cluster-for-iterative-ml)
- CRON scheduling — automated recurring tasks (retraining, data refreshes, monitoring)
- Automatic caching — ~80% of workflows use daily, saving 1,000+ hours/month (patterns/deterministic-task-caching)
Orchestrates jobs across platforms including Databricks, with access to relevant data sources.
Pillar 3 — Embedded Compliance Controls¶
Multi-layer governance framework:
- User identity-based access control — only authorized users can initiate/manage workflows
- Domain-level access control — clear data boundaries per workflow type (experimentation vs. production vs. ad-hoc)
- Column-level classification — data classification tags at column granularity with automatic tag propagation through pipeline stages; unclassified columns treated conservatively by default
Integration Layer¶
ML Studio connects to Atlassian's broader ML ecosystem:
- Experiment tracking (compare runs and iterate)
- Central feature store (reuse trusted features)
- Model registry (version, approve, reuse models)
- Monitoring — ML Lens (multi-dimensional metrics for training/production regressions)
- Deployment & serving (internal serving platform)
- Service integrations (APIs, microservice triggers)
Workloads Served¶
- GenAI — LLM fine-tuning, automated prompt optimization, batch inference with cost/priority-aware workload management
- Predictive AI — end-to-end pipelines from labeling through feature engineering with data-residency-compatible storage
- Training & evaluation — complex evaluation pipelines, automated retraining triggers, safe rollout via versioning/tagging
Powers: Rovo Search (ranking, Q&A, chat), Dev-AI (code review), AIOps (alert grouping), Knowledge Discovery (topic extraction), Confluence AI (summaries, snippets, labeling).
Operational Scale¶
- 5M+ monthly active Rovo users served by models built on ML Studio
- 900k+ datasets generated with built-in access control
- ~120k monthly workflow runs by 100+ ML teams
- ~20k monthly model iterations
- 2,000+ reusable modules, 200k+ monthly iterations
Seen In¶
(Source: sources/2026-06-10-atlassian-architecting-scalable-ml-platforms)