CONCEPT Cited by 1 source
REST-based job submission¶
Definition¶
REST-based job submission is the architectural model in
which a long-running job's lifecycle is managed as a
resource addressable by a job ID over HTTP, rather than as
a process attached to a stateful interactive session. The
client submits work via POST, polls status via GET, and
cancels via DELETE — and crucially, the client can crash
and restart between any two of those operations without
losing the job's state.
It is the paradigm shift Slack canonicalises in the wiki against the older SSH job execution anti-pattern.
The lifecycle, verbatim¶
"Modern compute engines (YARN, Trino, Snowflake) expose HTTP APIs for job submission. Instead of maintaining a connection, you:
POST a job request → receive a job ID GET job status using the ID → check if it's running, completed, or failed DELETE the job → cleanly cancel, if needed"
Three primitives — submit, status, cancel — and a server-managed resource addressable by ID.
Contrast with SSH execution¶
SSH job execution is the natural starting point for job-orchestration systems built on simple primitives, but it embeds three assumptions that fail at scale:
| Property | SSH execution | REST-based submission |
|---|---|---|
| State location | In the SSH session on the client | On the server, addressable by job ID |
| Survives client crash? | No — connection drops mean orphan or failure | Yes — job continues; client reconnects to query state |
| Auth model | Long-lived SSH keys distributed to every orchestration worker | Short-lived service-to-service tokens at the gateway edge |
| Audit trail | Logs scattered across compute hosts; "who ran that command?" mysteries | Structured logs at one REST surface per submission |
| Resource enforcement | Whatever the target host's policies happen to be (often nothing) | The REST API forwards to a runtime that enforces limits |
| Cancellation | Hope the SIGTERM propagates; risk of zombie processes | DELETE on job ID; runtime cleans up cleanly |
| Distribution | Job runs wherever you SSH'd to | Runtime allocates a container; job runs anywhere capacity exists |
Slack's 2026-05-05 post is the wiki's canonical articulation of this contrast. Verbatim summary of what they gained on the cutover:
"Jobs survive client Kubernetes pod restarts because Quarry maintains server-side job tracking. No more zombie processes. Jobs terminate properly when cancelled through REST APIs. […] Structured job status, logs, and metrics are now available through Quarry's API."
The hanging-up-mid-call analogy¶
The post offers a memorable framing of why stateful execution fails:
"When you SSH into a machine and run a command, you're creating a direct, stateful connection. If that connection drops (say your Kubernetes pod restarts), the command might keep running, might fail, or might leave orphaned processes hanging around. You've got no reliable way to reconnect and check status. It's like hanging up mid-phone call and hoping the other person finishes the conversation."
The REST model decouples the conversation about the job from the running of the job. The client can hang up anytime; the job state is the persistent resource.
Why "the gateway pattern" is the natural composition¶
A heterogeneous workload mix typically has multiple compute engines (Hadoop / SQL warehouse / Snowflake-class warehouse / arbitrary shell). REST-based submission generalises to a single gateway that fronts all of them with one auth model, one audit substrate, one observability stack — see patterns/rest-gateway-for-compute-engine-job-submission and its canonical instance Quarry.
Universality requires a shell runner¶
REST submission is straightforward for jobs that already have
a REST front-end (Spark via Livy, Hive
via HiveServer2, Trino, Snowflake). The hard case is arbitrary
shell commands like aws s3 sync or hadoop distcp that
don't have a framework REST API. Slack's solution was to
discover that YARN Distributed
Shell was already a REST-submittable shell runner — eliminating
the need to build a custom remote-execution service. See
patterns/yarn-distributed-shell-as-universal-shell-executor.
Seen in¶
- sources/2026-05-05-slack-from-ssh-to-rest-a-security-driven-modernization-of-slacks-emr-data-pipelines — canonical wiki source. The post is structured around the paradigm shift from SSH-as-job-substrate to REST-as-job-substrate and is the wiki's first end-to-end retrospective on a large-scale (700+ jobs, 8 regions) migration of this kind.
Related¶
- concepts/ssh-job-execution-anti-pattern — what this concept is the architectural alternative to.
- concepts/master-node-resource-contention, concepts/resource-enforcement-bypass-via-ssh — operational failure modes that disappear under REST submission.
- concepts/audit-trail — REST submission gives you a single structured audit substrate; SSH-execution does not.
- concepts/attack-surface-minimization — eliminating SSH keys is a textbook attack-surface-shrink.
- patterns/rest-gateway-for-compute-engine-job-submission — the gateway-shape pattern that composes from this concept.
- patterns/yarn-distributed-shell-as-universal-shell-executor — what makes the gateway model viable for arbitrary commands.
- systems/apache-yarn, systems/slack-quarry, systems/yarn-distributed-shell — the canonical wiki-cited substrate.