SYSTEM Cited by 1 source

Atlassian ML Studio¶

Definition¶

Atlassian ML Studio is Atlassian's unified enterprise-scale ML development platform that standardizes modular development, centralizes workflow orchestration, and embeds data governance directly into the execution layer. It is the mission-critical backbone for AI systems including Rovo Search and Chat, the Teamwork Graph, and Confluence AI, enabling ~120k monthly workflow runs across 100+ ML teams and serving models to 5M+ monthly active Rovo users.

Architecture¶

Three architectural pillars:

Pillar 1 — Composable, Reusable ML Modules¶

Each module is a self-contained, executable unit (data processing, model training, packaging, evaluation, deployment). Modules are versioned, shareable, and composable into end-to-end pipelines via the module-as-versioned-artifact pattern:

Every code push produces a module artifact used in workflows
Artifact tags (latestAlpha, latest) enable rollback without redeploying infrastructure
Dynamic CI/CD pipelines provide independent automated tests per module
Teams own module namespaces autonomously; shared libraries form a reusable catalog
2,000+ reusable modules with 200k+ monthly iterations

The local dev loop builds Python modules in <30 seconds (down from minutes), with RDEs mirroring production. Repository is reserved for peer review and promotion, not iteration.

Pillar 2 — Workflow Orchestrator¶

Central service that manages, schedules, and automates ML workflows via CLI, portal, and API:

Cloning/rerun — reproduce or adapt previous runs trivially
Flexible triggering — portal, CLI, or programmatic API
Composable workflows — nested and joined sub-workflows
Hot clusters — pre-provisioned clusters stay active between runs (patterns/hot-cluster-for-iterative-ml)
CRON scheduling — automated recurring tasks (retraining, data refreshes, monitoring)
Automatic caching — ~80% of workflows use daily, saving 1,000+ hours/month (patterns/deterministic-task-caching)

Orchestrates jobs across platforms including Databricks, with access to relevant data sources.

Pillar 3 — Embedded Compliance Controls¶

Multi-layer governance framework:

User identity-based access control — only authorized users can initiate/manage workflows
Domain-level access control — clear data boundaries per workflow type (experimentation vs. production vs. ad-hoc)
Column-level classification — data classification tags at column granularity with automatic tag propagation through pipeline stages; unclassified columns treated conservatively by default

Integration Layer¶

ML Studio connects to Atlassian's broader ML ecosystem:

Experiment tracking (compare runs and iterate)
Central feature store (reuse trusted features)
Model registry (version, approve, reuse models)
Monitoring — ML Lens (multi-dimensional metrics for training/production regressions)
Deployment & serving (internal serving platform)
Service integrations (APIs, microservice triggers)

Workloads Served¶

GenAI — LLM fine-tuning, automated prompt optimization, batch inference with cost/priority-aware workload management
Predictive AI — end-to-end pipelines from labeling through feature engineering with data-residency-compatible storage
Training & evaluation — complex evaluation pipelines, automated retraining triggers, safe rollout via versioning/tagging

Powers: Rovo Search (ranking, Q&A, chat), Dev-AI (code review), AIOps (alert grouping), Knowledge Discovery (topic extraction), Confluence AI (summaries, snippets, labeling).

Operational Scale¶

5M+ monthly active Rovo users served by models built on ML Studio
900k+ datasets generated with built-in access control
~120k monthly workflow runs by 100+ ML teams
~20k monthly model iterations
2,000+ reusable modules, 200k+ monthly iterations

Seen In¶

sources/2026-06-10-atlassian-architecting-scalable-ml-platforms

(Source: sources/2026-06-10-atlassian-architecting-scalable-ml-platforms)