---
title: Architecting Scalable ML Platforms: The Integrated Infrastructure and Acceleration Behind Rovo
source: Atlassian Engineering
source_slug: atlassian
url: https://www.atlassian.com/blog/how-we-build/architecting-scalable-ml-platforms
published: 2026-06-10
fetched: 2026-06-11T14:01:28+00:00
ingested: true
---

* * *

**A novel, enterprise-scale architecture for modular ML development, high-velocity experimentation, and embedded governance – powering thousands of production workflows that underpin AI systems serving millions of Rovo users globally.**

* * *

## Introduction: Engineering for Speed, Scale, and Governance in Enterprise AI

As enterprise adoption of machine learning accelerates, organizations must operate ML systems across large, distributed teams and complex, interdependent workflows – while maintaining strict data governance and high development velocity. Traditional ML infrastructure, often fragmented and tightly coupled, fails to scale under these demands, creating bottlenecks in experimentation, compliance, and production reliability.

In response to these systemic challenges, we architected ML Studio at Atlassian – a unified, enterprise-scale ML platform that standardizes modular development, centralizes workflow orchestration, and embeds governance directly into the execution layer. This architecture serves as the mission-critical backbone for AI systems including **Rovo Search and Chat, Teamwork graph and Confluence** , enabling thousands of production workflow runs daily and serving millions of users globally.

In this technical blog, we examine the architectural evolution of ML Studio and the design principles underlying scalable, high-velocity, and governed ML systems. These patterns extend beyond a single platform, offering a reusable foundation for building reliable and compliant machine learning infrastructure at enterprise scale.

* * *

## What is ML Studio?

ML Studio is Atlassian’s ML development platform, designed to streamline the end-to-end ML lifecycle.

![](https://atlassianblog.wpengine.com/wp-content/uploads/2026/06/why-we-built-ml-studio-6.jpg)

### **Key capabilities**

  * **Reusable ML modules:** Modular components for data processing, model training, packaging, evaluation, and deployment, enabling rapid experimentation.
  * **Granular data access controls:** Automated, column-level enforcement to ensure teams access only what they are approved for.
  * **Unified workflow orchestration:** A portal and CLI to run and manage workflows, orchestrating jobs across platforms like Databricks, with access to relevant data sources.
  * **GPU Infra Integration** : Training workloads run on GPU clusters are accelerated by distributed training.
  * **Automated MLOps:** Connects with feature stores, experiment trackers, monitoring, and deployment platforms, streamlining the path from prototype to production.


### Key metrics

  * **Over 5 million** monthly active Rovo users served by models built on ML Studio
  * **> 900k** datasets generated with built-in access control
  * **~120k** monthly workflow runs by **100+** ML teams
  * **~20K** monthly ML model iterations and experimentations
  * **> 2k** reusable ML modules with **> 200k **monthly iterations
  * Overall, over **75% of Fortune 500 companies** and over **90% of enterprise cloud customers globally** are using Rovo


* * *

## Pillars of ML Studio

![](https://atlassianblog.wpengine.com/wp-content/uploads/2026/06/screenshot-2026-04-20-at-12.05.53-pm-scaled.png)

### **Pillar 1 – Composable, Reusable ML Modules**

ML Studio makes it easy to reuse modular components. Each module is a self-contained, executable unit that can be combined with others to create powerful, end-to-end pipelines. Modules are versioned, sharable across teams, and turn one‑off jobs into a catalog of reusable building blocks.

#### Earlier pain points

Before modules, teams owned one-off jobs that were hard to evolve or reuse. Releases were brittle, and parallel work created conflicts and delays.

Developers waited for slow branch deployments and manual infrastructure steps. Single-version jobs made rollbacks and comparisons hard. Contributors could overwrite each other’s changes, and teams often rebuilt jobs instead of sharing stable code. All changes required centralized review creating bottlenecks.

#### Solution – Module Management Layer in ML Studio

ML Studio’s module management layer connects modules to Git and treats every push as a buildable artifact. It creates a repeatable way to package and promote code without manually coordinating infrastructure.

Every code push produces a module artifact used in workflows, which reduces deployment waits and user conflicts. Artifact versioning and tags such as `latestAlpha` and `latest` let teams move between versions or roll back without redeploying infrastructure. Dynamic CI and CD pipelines provide independent automated tests per module, teams own their own module namespaces and manage their code autonomously, and shared libraries and generic modules form a growing catalog for reuse.

#### Local Dev Loop – a Leap Forward

![](https://atlassianblog.wpengine.com/wp-content/uploads/2026/06/local-dev-loop-2.jpg)

ML Studio further streamlines development with local builds and remote developer environments (RDEs). Developers can quickly build and test modules in a dedicated environment with a single command, reducing iteration time and context switching. Repositories are reserved for peer review and promotion instead of every micro-iteration.

Python module builds that once took minutes now complete in under 30 seconds, with similar gains for Docker builds. Local builds now represent a large share of daily builds, saving thousands of developer minutes each week.

### **Pillar 2 – Workflow Orchestrator**

At the heart of ML Studio is the Workflow Orchestrator service that manages, schedules, and automates ML workflows. Its CLI and portal replace manual and infrastructure-heavy deployment processes, enabling teams to move from idea to production quickly and reliably.

**Key orchestrator capabilities:**

  * **Effortless cloning and reusability:**  
Users can easily clone, run, or rerun their own or teammates’ previous workflow runs. This reduces the time required to reproduce results or adapt existing solutions to new problems.
  * **Flexible workflow triggering:**  
Workflows can be launched from the Portal, CLI or programmatically via APIs. This flexibility enables integration with other Atlassian services and supports custom automation.
  * **Composable workflows:**
  * Nested and joined workflows enable teams to build complex pipelines from smaller, reusable sub-workflows, making it easy to streamline the most sophisticated ML processes.
  * **Hot clusters for rapid iteration:**  
ML Studio workflows can run on hot clusters that remain active between runs, eliminating the wait time for cluster provisioning.
  * **Automated scheduling:**  
Built-in CRON scheduling allows users to automate recurring tasks, ensuring workflows run exactly when needed, whether for regular retraining, data refreshes, or monitoring.


### **Pillar 3 – Embedded Compliance Controls**

As ML workflows grow in complexity, Atlassian needed a scalable framework to enforce data governance, without slowing down innovation.

**Key elements of our multi-layer compliance framework:**

  1. **User Identity-Based Access Control –** Only authorized users can initiate or manage workflows. Teams can control which environments and datasets each user can access.
  2. **Domain-Level Access Control –** ML Studio defines clear data boundaries for each workflow type (experimentation, ad hoc analysis, production, etc.), controlling which data a workflow can read or write based on its purpose and environment.
  3. **Column-Level Data Classification and Enforcement –** ML Studio leverages data classification tags at the column level to enforce controls. 
     * **Tag propagation:** As data flows through modules, ML Studio provides utilities to automatically propagate data classification from input to output tables, reducing manual effort and error.
     * **Default safety:** Columns without an explicit classification are treated conservatively by default, so only appropriate data can be published to interactive catalogs.


* * *

## Improving Dev Productivity with ML Studio

### **1\. Automatic caching**

ML Studio’s automatic caching accelerates ML development. When iterating on complex workflows, users often modify only a few components, but without caching, every run would re-execute all steps, wasting time and resources.

Caching detects when a task has already run with the same parameters and inputs, and reuses stored results, so developers only re-run what’s changed.

This approach has led to significant productivity and cost gains:

  * **~80% of ML Studio workflows** leverage caching daily.
  * **1000+ hours of workflow execution time** saved per month


### **2\. Experimental Workflows**

ML Studio’s experimental workflows enable rapid, PR-free experimentation, removing traditional bottlenecks around dataset access and manual approvals.

  * **Faster experimentation:** Direct access to centralized, approved datasets with fewer manual or redundant reviews.
  * **Significant productivity gains:** Experimental workflows account for over half of all ML Studio runs, saving an average of 100+ hours per day across Atlassian’s Central AI org.


### **3\. Easy Cross-Functional Integrations**

ML Studio plugs into Atlassian’s ML ecosystem so teams can move from experiment → deployment without stitching together tools or losing momentum.

  * **Experiment tracking:** compare runs and iterate faster.
  * **Central feature store:** reuse trusted features instead of rebuilding pipelines.
  * **Model registry:** version, approve, and reuse models safely across teams.
  * **Monitoring (ML Lens):** catch training/production regressions early with multi-dimensional metrics.
  * **Deployment & serving:** push models to production via Atlassian’s internal serving platform.
  * **Service integrations:** trigger workflows via APIs and integrate with other teams’ microservices.


This integration layer reduces context switching and operational overhead, preserving fast iteration from idea to impact.

* * *

## Common Workloads powered by ML Studio

**GenAI** — Out-of-the-box LLM fine-tuning, automated prompt optimization with built-in evaluation, and scalable batch inference with cost-and priority-aware workload management.

**Predictive AI** — End-to-end pipelines from labeling (LLM-assisted, human, and programmatic) through feature engineering with unified data access, column-level classification and storage compatible with data residency offering. LLM-assisted labeling helps overcome scaling constraints.

**Training & Evaluation** — Complex evaluation pipelines leveraging production signals and reproducible offline tests, with automated retraining triggers, integrated monitoring, and safe rollout patterns (versioning, tagging).

### AI Features Empowered

  * **Rovo Search:** ranking, Q&A, and chat experiences
  * **Dev-AI:** code review and build automation
  * **AIOps:** automatic alert grouping and clustering
  * **Knowledge discovery:** automated topic extraction across content
  * **Confluence AI:** auto-generated snippets, summaries, related-item suggestions, and content labeling
  * **Loom AI / CoreML:** forecasting and growth-oriented ML for marketing and product