SYSTEM Cited by 1 source
Apache YARN¶
Apache YARN (Yet Another Resource Negotiator) is the resource manager and job scheduler that sits at the heart of Apache Hadoop. It allocates cluster resources (memory, vCores) into containers, schedules ApplicationMasters that run the job-specific orchestration logic, enforces resource limits at the node level, and exposes a REST API for job submission and lifecycle management.
This wiki has dedicated pages for the major YARN-hosted job frameworks (Spark, Hive, YARN Distributed Shell). This page is YARN-the-resource-manager itself.
What YARN provides as a substrate¶
- Container isolation — jobs run in process trees managed by NodeManagers, not on shared cluster login nodes.
- Resource enforcement — memory and vCore caps per container, applied at run time.
- Lifecycle management — submit, status-poll, cancel via ResourceManager REST endpoints.
- Retry / fault tolerance — application-master restart on node failure; configurable container retry.
- Logging — stdout/stderr captured per container; aggregated to HDFS or remote stores in production deployments.
- Multi-framework support — the same ResourceManager hosts MapReduce, Spark, Hive, and arbitrary shell jobs (via Distributed Shell).
REST API as the canonical submission surface¶
The lesson Slack canonicalised in their 2026-05-05 retrospective is that YARN's REST API is the right submission surface for heterogeneous workloads, not a per-job-type interface:
"Modern compute engines (YARN, Trino, Snowflake) expose HTTP APIs for job submission. Instead of maintaining a connection, you POST a job request → receive a job ID; GET job status using the ID; DELETE the job." (Source: sources/2026-05-05-slack-from-ssh-to-rest-a-security-driven-modernization-of-slacks-emr-data-pipelines)
See concepts/rest-based-job-submission for the generalised async-job-lifecycle paradigm.
YARN's resource enforcement is real (vmem-check story)¶
A subtle property of YARN that often surfaces during migrations off of bypass paths like SSH-to-master-node:
"SSH commands ran directly on the master node, bypassing YARN's resource enforcement entirely. Quarry submits jobs properly to YARN, which actually enforces resource limits. The vmem check was rejecting containers that exceeded virtual memory limits (which SSH had been quietly ignoring)."
The article documents the AWS-recommended workaround:
yarn.nodemanager.vmem-check-enabled: false — because Linux
virtual-memory accounting is unreliable for the kinds of JVMs
and Python processes typical of Hadoop workloads, and physical
memory limits are sufficient for resource governance.
This is canonicalised as concepts/resource-enforcement-bypass-via-ssh — migrations that move workloads onto YARN's enforced container runtime surface latent resource-violating jobs that previously ran silently.
Master-node resource contention¶
Closely related: when jobs are submitted to YARN, they get distributed across worker NodeManagers. When jobs are SSH'd onto the master node, they all run on the same shared host and compete for resources. Slack's 700+ SSH-based jobs were "running directly on EMR master nodes instead of being distributed, causing resource contention" — eliminated by Quarry/YARN. See concepts/master-node-resource-contention.
YARN Distributed Shell — the universal shell-runner¶
YARN ships an ApplicationMaster
(org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster)
that runs an arbitrary shell script in a YARN container with all
the substrate guarantees above. This is the breakthrough enabler
in Slack's SSH-elimination story — it lets a single REST gateway
(Quarry) serve MapReduce, Spark, Hive,
and arbitrary CLI commands through one protocol. See
systems/yarn-distributed-shell.
Common managed substrates¶
YARN is most commonly deployed inside:
- Amazon EMR — managed Hadoop, with EMR managing the ResourceManager + NodeManager fleet.
- EMR-on-EKS — same, on Kubernetes-substrate workers.
- On-prem Hortonworks / Cloudera distributions (legacy).
Seen in¶
- sources/2026-05-05-slack-from-ssh-to-rest-a-security-driven-modernization-of-slacks-emr-data-pipelines — canonical wiki source. YARN's REST API is the submission substrate Slack's Quarry gateway forwards to; YARN's container resource enforcement is what surfaced the latent vmem-check failures hidden by years of master-node SSH execution; YARN Distributed Shell is the breakthrough that made one gateway viable for all job types.