Skip to content

SYSTEM Cited by 1 source

YARN Distributed Shell

YARN Distributed Shell (org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster) is a YARN ApplicationMaster shipped as part of YARN that allows any shell script to run in a proper YARN container with full resource allocation and lifecycle management — without custom packaging, framework wrappers, or new YARN job types.

It is, in the words of Slack's data platform team, "a little-known feature […] already part of YARN, used the same REST APIs, and required no custom security layer" (Source: sources/2026-05-05-slack-from-ssh-to-rest-a-security-driven-modernization-of-slacks-emr-data-pipelines). For Slack's SSH-deprecation initiative, it was the breakthrough enabler that made a single REST gateway (Quarry) viable for all job types — not just Hadoop workloads.

What it does

DistShell takes a shell script and runs it inside a YARN container, applying the YARN substrate's standard guarantees:

  1. Proper resource limits — memory and vCores per the ApplicationMaster spec, enforced by the YARN NodeManager.
  2. Container isolation — the script runs in a YARN-managed process tree, not on the cluster's master node.
  3. Retry and fault tolerance — same retry/restart semantics as any YARN job.
  4. Clean cancellation — DELETE on the job ID via YARN's REST API terminates the container cleanly.
  5. Logging through YARN UI — stdout/stderr captured by YARN and accessible through standard tooling.

Submission shape (Slack's example)

The Slack post documents the actual REST submission shape:

  1. Upload script to S3 — e.g. s3://bucket/command.sh containing aws s3 sync /tmp/data/ s3://bucket/output/.

  2. Submit to YARN with application-type: MAPREDUCE and an am-container-spec whose commands.command invokes the DistShell ApplicationMaster Java class. The script location is passed as environment variables:

{
  "application-type": "MAPREDUCE",
  "am-container-spec": {
    "commands": {
      "command": "{{JAVA_HOME}}/bin/java org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster ..."
    },
    "environment": {
      "DISTRIBUTEDSHELLSCRIPTLOCATION": "s3://bucket/command.sh",
      "DISTRIBUTEDSHELLSCRIPTLEN": "548",
      "DISTRIBUTEDSHELLSCRIPTTIMESTAMP": "1768529627000"
    }
  }
}
  1. YARN allocates a container, downloads the script, and executes it — and a job ID flows back to the caller for subsequent status / cancel operations through the standard YARN REST API.

Why it's load-bearing for the SSH-to-REST migration

Slack had three job categories to migrate off SSH:

  • Spark — already has Livy REST API.
  • Hive — already has HiveServer2.
  • MapReduce + 300+ arbitrary shell-command jobs (aws s3 sync, hadoop distcp, custom Python scripts) — no native REST option.

The third category was the hard part. Slack considered three alternatives and rejected all three before discovering DistShell:

"We brainstormed multiple approaches. […] Some ideas we considered: Building a custom wrapper service to execute commands remotely; Using remote execution frameworks like Ansible or Salt; Creating a new job type in YARN from scratch. All of these felt too complex, required custom security implementations, or introduced new dependencies we'd have to maintain. Not great options."

DistShell was already in YARN, used the same REST APIs as everything else, and required no custom security layer. The discovery was, verbatim:

"It's a little-known feature […] that allows any shell script to run in a proper YARN container with resource allocation and lifecycle management."

After the discovery, "this architectural decision unlocked the migration of all SSH-based jobs."

The general pattern

DistShell is the canonical instance of the patterns/yarn-distributed-shell-as-universal-shell-executor pattern: when a heterogeneous workload mix has REST submission paths for the framework-typed jobs (Spark, Hive) but lacks one for arbitrary shell commands, an existing-but-overlooked feature of the resource manager often closes the gap without custom infrastructure. The lesson generalises beyond YARN — check whether your existing substrate already exposes a generic shell-runner before building one.

Stub-level seen-in (single source)

Last updated · 542 distilled / 1,571 read