Skip to content

SYSTEM Cited by 1 source

Apache YARN

Apache YARN (Yet Another Resource Negotiator) is the resource manager and job scheduler that sits at the heart of Apache Hadoop. It allocates cluster resources (memory, vCores) into containers, schedules ApplicationMasters that run the job-specific orchestration logic, enforces resource limits at the node level, and exposes a REST API for job submission and lifecycle management.

This wiki has dedicated pages for the major YARN-hosted job frameworks (Spark, Hive, YARN Distributed Shell). This page is YARN-the-resource-manager itself.

What YARN provides as a substrate

  • Container isolation — jobs run in process trees managed by NodeManagers, not on shared cluster login nodes.
  • Resource enforcement — memory and vCore caps per container, applied at run time.
  • Lifecycle management — submit, status-poll, cancel via ResourceManager REST endpoints.
  • Retry / fault tolerance — application-master restart on node failure; configurable container retry.
  • Logging — stdout/stderr captured per container; aggregated to HDFS or remote stores in production deployments.
  • Multi-framework support — the same ResourceManager hosts MapReduce, Spark, Hive, and arbitrary shell jobs (via Distributed Shell).

REST API as the canonical submission surface

The lesson Slack canonicalised in their 2026-05-05 retrospective is that YARN's REST API is the right submission surface for heterogeneous workloads, not a per-job-type interface:

"Modern compute engines (YARN, Trino, Snowflake) expose HTTP APIs for job submission. Instead of maintaining a connection, you POST a job request → receive a job ID; GET job status using the ID; DELETE the job." (Source: sources/2026-05-05-slack-from-ssh-to-rest-a-security-driven-modernization-of-slacks-emr-data-pipelines)

See concepts/rest-based-job-submission for the generalised async-job-lifecycle paradigm.

YARN's resource enforcement is real (vmem-check story)

A subtle property of YARN that often surfaces during migrations off of bypass paths like SSH-to-master-node:

"SSH commands ran directly on the master node, bypassing YARN's resource enforcement entirely. Quarry submits jobs properly to YARN, which actually enforces resource limits. The vmem check was rejecting containers that exceeded virtual memory limits (which SSH had been quietly ignoring)."

The article documents the AWS-recommended workaround: yarn.nodemanager.vmem-check-enabled: false — because Linux virtual-memory accounting is unreliable for the kinds of JVMs and Python processes typical of Hadoop workloads, and physical memory limits are sufficient for resource governance.

This is canonicalised as concepts/resource-enforcement-bypass-via-sshmigrations that move workloads onto YARN's enforced container runtime surface latent resource-violating jobs that previously ran silently.

Master-node resource contention

Closely related: when jobs are submitted to YARN, they get distributed across worker NodeManagers. When jobs are SSH'd onto the master node, they all run on the same shared host and compete for resources. Slack's 700+ SSH-based jobs were "running directly on EMR master nodes instead of being distributed, causing resource contention" — eliminated by Quarry/YARN. See concepts/master-node-resource-contention.

YARN Distributed Shell — the universal shell-runner

YARN ships an ApplicationMaster (org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster) that runs an arbitrary shell script in a YARN container with all the substrate guarantees above. This is the breakthrough enabler in Slack's SSH-elimination story — it lets a single REST gateway (Quarry) serve MapReduce, Spark, Hive, and arbitrary CLI commands through one protocol. See systems/yarn-distributed-shell.

Common managed substrates

YARN is most commonly deployed inside:

  • Amazon EMR — managed Hadoop, with EMR managing the ResourceManager + NodeManager fleet.
  • EMR-on-EKS — same, on Kubernetes-substrate workers.
  • On-prem Hortonworks / Cloudera distributions (legacy).

Seen in

Last updated · 542 distilled / 1,571 read