SYSTEM Cited by 1 source
YARN Distributed Shell¶
YARN Distributed Shell (org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster)
is a YARN ApplicationMaster shipped as part of YARN
that allows any shell script to run in a proper YARN container
with full resource allocation and lifecycle management — without
custom packaging, framework wrappers, or new YARN job types.
It is, in the words of Slack's data platform team, "a little-known feature […] already part of YARN, used the same REST APIs, and required no custom security layer" (Source: sources/2026-05-05-slack-from-ssh-to-rest-a-security-driven-modernization-of-slacks-emr-data-pipelines). For Slack's SSH-deprecation initiative, it was the breakthrough enabler that made a single REST gateway (Quarry) viable for all job types — not just Hadoop workloads.
What it does¶
DistShell takes a shell script and runs it inside a YARN container, applying the YARN substrate's standard guarantees:
- Proper resource limits — memory and vCores per the ApplicationMaster spec, enforced by the YARN NodeManager.
- Container isolation — the script runs in a YARN-managed process tree, not on the cluster's master node.
- Retry and fault tolerance — same retry/restart semantics as any YARN job.
- Clean cancellation — DELETE on the job ID via YARN's REST API terminates the container cleanly.
- Logging through YARN UI — stdout/stderr captured by YARN and accessible through standard tooling.
Submission shape (Slack's example)¶
The Slack post documents the actual REST submission shape:
-
Upload script to S3 — e.g.
s3://bucket/command.shcontainingaws s3 sync /tmp/data/ s3://bucket/output/. -
Submit to YARN with
application-type: MAPREDUCEand anam-container-specwhosecommands.commandinvokes the DistShell ApplicationMaster Java class. The script location is passed as environment variables:
{
"application-type": "MAPREDUCE",
"am-container-spec": {
"commands": {
"command": "{{JAVA_HOME}}/bin/java org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster ..."
},
"environment": {
"DISTRIBUTEDSHELLSCRIPTLOCATION": "s3://bucket/command.sh",
"DISTRIBUTEDSHELLSCRIPTLEN": "548",
"DISTRIBUTEDSHELLSCRIPTTIMESTAMP": "1768529627000"
}
}
}
- YARN allocates a container, downloads the script, and executes it — and a job ID flows back to the caller for subsequent status / cancel operations through the standard YARN REST API.
Why it's load-bearing for the SSH-to-REST migration¶
Slack had three job categories to migrate off SSH:
- Spark — already has Livy REST API.
- Hive — already has HiveServer2.
- MapReduce + 300+ arbitrary shell-command jobs
(
aws s3 sync,hadoop distcp, custom Python scripts) — no native REST option.
The third category was the hard part. Slack considered three alternatives and rejected all three before discovering DistShell:
"We brainstormed multiple approaches. […] Some ideas we considered: Building a custom wrapper service to execute commands remotely; Using remote execution frameworks like Ansible or Salt; Creating a new job type in YARN from scratch. All of these felt too complex, required custom security implementations, or introduced new dependencies we'd have to maintain. Not great options."
DistShell was already in YARN, used the same REST APIs as everything else, and required no custom security layer. The discovery was, verbatim:
"It's a little-known feature […] that allows any shell script to run in a proper YARN container with resource allocation and lifecycle management."
After the discovery, "this architectural decision unlocked the migration of all SSH-based jobs."
The general pattern¶
DistShell is the canonical instance of the patterns/yarn-distributed-shell-as-universal-shell-executor pattern: when a heterogeneous workload mix has REST submission paths for the framework-typed jobs (Spark, Hive) but lacks one for arbitrary shell commands, an existing-but-overlooked feature of the resource manager often closes the gap without custom infrastructure. The lesson generalises beyond YARN — check whether your existing substrate already exposes a generic shell-runner before building one.
Stub-level seen-in (single source)¶
- sources/2026-05-05-slack-from-ssh-to-rest-a-security-driven-modernization-of-slacks-emr-data-pipelines — canonical wiki source. DistShell is positioned as "the breakthrough that made this whole migration possible"; used via Quarry to run all 300+ CLI-based jobs (formerly SSH'd to the EMR master node) inside YARN containers with proper resource enforcement.
Related¶
- systems/apache-yarn — the resource manager that ships DistShell.
- systems/hadoop — the broader ecosystem.
- systems/slack-quarry — the canonical caller in this wiki.
- systems/amazon-emr — the substrate Slack runs YARN on.
- patterns/yarn-distributed-shell-as-universal-shell-executor — the named pattern.
- patterns/rest-gateway-for-compute-engine-job-submission — the gateway pattern DistShell unblocks.
- concepts/rest-based-job-submission — the paradigm.