SYSTEM Cited by 1 source

Slack Quarry¶

Quarry is Slack's internal REST-based job-submission gateway sitting between callers (most prominently Airflow) and multiple compute engines (YARN on EMR, Trino, Snowflake). It is the canonical instance in the wiki of the REST gateway for compute-engine job submission pattern.

Origin¶

Quarry was "originally built to provide a unified interface for submitting jobs across multiple compute engines (EMR/YARN, Trino, Snowflake)" (Source: sources/2026-05-05-slack-from-ssh-to-rest-a-security-driven-modernization-of-slacks-emr-data-pipelines). It pre-dated the SSH-deprecation initiative — the gateway already solved authentication, reliability, and observability — and became "exactly what we needed" when Slack decided to eliminate SSH-based job execution entirely.

What Quarry does¶

Per the 2026-05-05 retrospective, Quarry handles five concerns on behalf of its callers:

Authentication — service-to-service tokens replace SSH keys on individual orchestration workers.
Job submission — REST APIs to YARN, Trino, and Snowflake.
State tracking — server-side monitoring of job status, so client crashes don't lose job state.
Lifecycle management — clean cancellation and cleanup through REST APIs (DELETE on a job ID).
Observability — structured logs, metrics, and tracing for every job submission.

The architecture shift it enabled¶

Verbatim from the post:

Before: Airflow → SSH Connection → EMR Master Node → Execute Command

After: Airflow → Quarry REST API → YARN ResourceManager → EMR Container

Three things change at once: the transport (SSH → HTTP), the state model (stateful connection → stateless RESTful resource with job ID), and the execution location (master node → YARN container with proper resource isolation).

YARN Distributed Shell as universal executor¶

The architectural breakthrough that made Quarry-via-YARN viable for all of Slack's job types — not just Hadoop workloads — was YARN Distributed Shell. Spark and Hive already had REST APIs (Livy and HiveServer2). MapReduce and the 300+ CLI-based jobs running arbitrary shell commands (aws s3 sync, hadoop distcp, custom Python scripts) had no native REST option until Slack discovered DistShell — the YARN ApplicationMaster that runs an arbitrary shell script in a YARN container with proper resource limits, isolation, retry, cancellation, and logging. With DistShell, "whether you're running a Spark job, a Hive query, or a simple shell script, it all goes through the same REST API."

See patterns/yarn-distributed-shell-as-universal-shell-executor for the named pattern.

Migration footprint¶

Quarry was the universal point through which Slack migrated:

700+ production jobs
7 operator types (named: CrunchExecOperator, S3SyncOperator, plus 5 others)
8 independent data regions with separate Quarry configurations, cluster endpoints, and network routing rules
5 teams (Search Infrastructure, Data Engineering & Analytics, ML Services, plus marketing-domain teams)
3 quarters end-to-end, zero downtime for business-critical services

What Quarry replaced (operationally)¶

Each of the operational improvements below maps to a property of the gateway architecture:

Operational property	Quarry's mechanism
No more SSH keys on Airflow workers	Service-to-service tokens at the Quarry edge
No more zombie jobs after pod restart	Server-side state in Quarry; client crash ≠ job failure
No more master-node resource contention	All non-Hadoop jobs run in YARN containers via DistShell
Audit trail per job submission	Structured logs at Quarry's REST surface
Clean cancellation	DELETE on the job ID; Quarry forwards to YARN
Distinct status from terminate-on-success	GET on job ID returns running / completed / failed
Observability across multiple engines	Quarry's logs cover YARN + Trino + Snowflake uniformly

What's not publicly disclosed¶

The 2026-05-05 post is a retrospective on the SSH-elimination initiative, not a Quarry architecture deep-dive. Not disclosed:

Internal architecture of Quarry itself — process model, storage substrate for job state, how it handles HA, etc.
Details of Trino + Snowflake adapters — only the YARN/DistShell path is explained.
Token rotation cadence and surface — the article notes service-to-service tokens replace SSH keys, but does not disclose how token issuance / rotation / scoping works at the 700-jobs × 8-regions scale.
No public API / open-source release — Quarry remains a Slack-internal system. The architectural shape is the generalisable artefact.
Container resource sizing for arbitrary shell commands — how Slack picks per-shell-job memory / vCores when YARN now enforces limits that SSH had been silently bypassing.

Seen in¶

sources/2026-05-05-slack-from-ssh-to-rest-a-security-driven-modernization-of-slacks-emr-data-pipelines — canonical wiki source. Quarry is positioned as the universal REST job-submission gateway that enabled 100% SSH elimination across 8 data regions, the unblocker for Spark-on-Kubernetes and Whitecastle child-account migration, and the substrate for service-token authentication + per-job audit trails. Sole source for Quarry as of 2026-05-21.

companies/slack
systems/yarn-distributed-shell — the YARN feature that let Quarry serve arbitrary shell jobs through a single protocol.
systems/apache-yarn — the resource manager Quarry submits to.
systems/amazon-emr — the cluster substrate where Quarry's YARN backend runs.
systems/apache-airflow — Quarry's biggest caller; SSH operators were replaced by Quarry operators in Airflow DAGs.
systems/trino, systems/snowflake — additional engines Quarry fronts.
patterns/rest-gateway-for-compute-engine-job-submission — the named architectural pattern Quarry canonicalises.
patterns/yarn-distributed-shell-as-universal-shell-executor — the breakthrough enabler.
concepts/rest-based-job-submission — the paradigm shift.
concepts/ssh-job-execution-anti-pattern — what Quarry replaced.
concepts/audit-trail — Quarry's per-submission logs are the new audit substrate.
concepts/attack-surface-minimization — eliminating SSH keys across 8 regions × 700+ jobs is a textbook attack-surface-shrink project.