CONCEPT Cited by 1 source

REST-based job submission¶

Definition¶

REST-based job submission is the architectural model in which a long-running job's lifecycle is managed as a resource addressable by a job ID over HTTP, rather than as a process attached to a stateful interactive session. The client submits work via POST, polls status via GET, and cancels via DELETE — and crucially, the client can crash and restart between any two of those operations without losing the job's state.

It is the paradigm shift Slack canonicalises in the wiki against the older SSH job execution anti-pattern.

The lifecycle, verbatim¶

From sources/2026-05-05-slack-from-ssh-to-rest-a-security-driven-modernization-of-slacks-emr-data-pipelines:

"Modern compute engines (YARN, Trino, Snowflake) expose HTTP APIs for job submission. Instead of maintaining a connection, you:

POST a job request → receive a job ID GET job status using the ID → check if it's running, completed, or failed DELETE the job → cleanly cancel, if needed"

Three primitives — submit, status, cancel — and a server-managed resource addressable by ID.

Contrast with SSH execution¶

SSH job execution is the natural starting point for job-orchestration systems built on simple primitives, but it embeds three assumptions that fail at scale:

Property	SSH execution	REST-based submission
State location	In the SSH session on the client	On the server, addressable by job ID
Survives client crash?	No — connection drops mean orphan or failure	Yes — job continues; client reconnects to query state
Auth model	Long-lived SSH keys distributed to every orchestration worker	Short-lived service-to-service tokens at the gateway edge
Audit trail	Logs scattered across compute hosts; "who ran that command?" mysteries	Structured logs at one REST surface per submission
Resource enforcement	Whatever the target host's policies happen to be (often nothing)	The REST API forwards to a runtime that enforces limits
Cancellation	Hope the SIGTERM propagates; risk of zombie processes	DELETE on job ID; runtime cleans up cleanly
Distribution	Job runs wherever you SSH'd to	Runtime allocates a container; job runs anywhere capacity exists

Slack's 2026-05-05 post is the wiki's canonical articulation of this contrast. Verbatim summary of what they gained on the cutover:

"Jobs survive client Kubernetes pod restarts because Quarry maintains server-side job tracking. No more zombie processes. Jobs terminate properly when cancelled through REST APIs. […] Structured job status, logs, and metrics are now available through Quarry's API."

The hanging-up-mid-call analogy¶

The post offers a memorable framing of why stateful execution fails:

"When you SSH into a machine and run a command, you're creating a direct, stateful connection. If that connection drops (say your Kubernetes pod restarts), the command might keep running, might fail, or might leave orphaned processes hanging around. You've got no reliable way to reconnect and check status. It's like hanging up mid-phone call and hoping the other person finishes the conversation."

The REST model decouples the conversation about the job from the running of the job. The client can hang up anytime; the job state is the persistent resource.

Why "the gateway pattern" is the natural composition¶

A heterogeneous workload mix typically has multiple compute engines (Hadoop / SQL warehouse / Snowflake-class warehouse / arbitrary shell). REST-based submission generalises to a single gateway that fronts all of them with one auth model, one audit substrate, one observability stack — see patterns/rest-gateway-for-compute-engine-job-submission and its canonical instance Quarry.

Universality requires a shell runner¶

REST submission is straightforward for jobs that already have a REST front-end (Spark via Livy, Hive via HiveServer2, Trino, Snowflake). The hard case is arbitrary shell commands like aws s3 sync or hadoop distcp that don't have a framework REST API. Slack's solution was to discover that YARN Distributed Shell was already a REST-submittable shell runner — eliminating the need to build a custom remote-execution service. See patterns/yarn-distributed-shell-as-universal-shell-executor.

Seen in¶

sources/2026-05-05-slack-from-ssh-to-rest-a-security-driven-modernization-of-slacks-emr-data-pipelines — canonical wiki source. The post is structured around the paradigm shift from SSH-as-job-substrate to REST-as-job-substrate and is the wiki's first end-to-end retrospective on a large-scale (700+ jobs, 8 regions) migration of this kind.

concepts/ssh-job-execution-anti-pattern — what this concept is the architectural alternative to.
concepts/master-node-resource-contention, concepts/resource-enforcement-bypass-via-ssh — operational failure modes that disappear under REST submission.
concepts/audit-trail — REST submission gives you a single structured audit substrate; SSH-execution does not.
concepts/attack-surface-minimization — eliminating SSH keys is a textbook attack-surface-shrink.
patterns/rest-gateway-for-compute-engine-job-submission — the gateway-shape pattern that composes from this concept.
patterns/yarn-distributed-shell-as-universal-shell-executor — what makes the gateway model viable for arbitrary commands.
systems/apache-yarn, systems/slack-quarry, systems/yarn-distributed-shell — the canonical wiki-cited substrate.