Skip to content

CONCEPT Cited by 1 source

REST-based job submission

Definition

REST-based job submission is the architectural model in which a long-running job's lifecycle is managed as a resource addressable by a job ID over HTTP, rather than as a process attached to a stateful interactive session. The client submits work via POST, polls status via GET, and cancels via DELETE — and crucially, the client can crash and restart between any two of those operations without losing the job's state.

It is the paradigm shift Slack canonicalises in the wiki against the older SSH job execution anti-pattern.

The lifecycle, verbatim

From sources/2026-05-05-slack-from-ssh-to-rest-a-security-driven-modernization-of-slacks-emr-data-pipelines:

"Modern compute engines (YARN, Trino, Snowflake) expose HTTP APIs for job submission. Instead of maintaining a connection, you:

POST a job request → receive a job ID GET job status using the ID → check if it's running, completed, or failed DELETE the job → cleanly cancel, if needed"

Three primitives — submit, status, cancel — and a server-managed resource addressable by ID.

Contrast with SSH execution

SSH job execution is the natural starting point for job-orchestration systems built on simple primitives, but it embeds three assumptions that fail at scale:

Property SSH execution REST-based submission
State location In the SSH session on the client On the server, addressable by job ID
Survives client crash? No — connection drops mean orphan or failure Yes — job continues; client reconnects to query state
Auth model Long-lived SSH keys distributed to every orchestration worker Short-lived service-to-service tokens at the gateway edge
Audit trail Logs scattered across compute hosts; "who ran that command?" mysteries Structured logs at one REST surface per submission
Resource enforcement Whatever the target host's policies happen to be (often nothing) The REST API forwards to a runtime that enforces limits
Cancellation Hope the SIGTERM propagates; risk of zombie processes DELETE on job ID; runtime cleans up cleanly
Distribution Job runs wherever you SSH'd to Runtime allocates a container; job runs anywhere capacity exists

Slack's 2026-05-05 post is the wiki's canonical articulation of this contrast. Verbatim summary of what they gained on the cutover:

"Jobs survive client Kubernetes pod restarts because Quarry maintains server-side job tracking. No more zombie processes. Jobs terminate properly when cancelled through REST APIs. […] Structured job status, logs, and metrics are now available through Quarry's API."

The hanging-up-mid-call analogy

The post offers a memorable framing of why stateful execution fails:

"When you SSH into a machine and run a command, you're creating a direct, stateful connection. If that connection drops (say your Kubernetes pod restarts), the command might keep running, might fail, or might leave orphaned processes hanging around. You've got no reliable way to reconnect and check status. It's like hanging up mid-phone call and hoping the other person finishes the conversation."

The REST model decouples the conversation about the job from the running of the job. The client can hang up anytime; the job state is the persistent resource.

Why "the gateway pattern" is the natural composition

A heterogeneous workload mix typically has multiple compute engines (Hadoop / SQL warehouse / Snowflake-class warehouse / arbitrary shell). REST-based submission generalises to a single gateway that fronts all of them with one auth model, one audit substrate, one observability stack — see patterns/rest-gateway-for-compute-engine-job-submission and its canonical instance Quarry.

Universality requires a shell runner

REST submission is straightforward for jobs that already have a REST front-end (Spark via Livy, Hive via HiveServer2, Trino, Snowflake). The hard case is arbitrary shell commands like aws s3 sync or hadoop distcp that don't have a framework REST API. Slack's solution was to discover that YARN Distributed Shell was already a REST-submittable shell runner — eliminating the need to build a custom remote-execution service. See patterns/yarn-distributed-shell-as-universal-shell-executor.

Seen in

Last updated · 542 distilled / 1,571 read