PATTERN Cited by 1 source
Pipeline stage as discrete job¶
Definition¶
Pipeline stage as discrete job is the pattern of decomposing a multi-step backend workflow (ingest, enrichment, transformation, persistence) into N small jobs that queue each other — each stage runs as its own job class, terminates on success, and enqueues the next stage's job — rather than one long-running job that does all steps in sequence.
The tunable this unlocks: each stage has its own batch size and retry policy, which matters when the stages have different cost structures (CPU-bound vs IO-bound vs GPU-bound), different failure modes (transient vs permanent), and different idempotency properties.
(Source: sources/2026-04-21-figma-the-infrastructure-behind-ai-search-in-figma)
Intent¶
A multi-step pipeline bundled as one job has three problems:
- One batch size fits none. The natural batch for a file-IO step is different from the natural batch for GPU inference, which is different from the natural batch for a KV-store write. A single job class picks one and everything else sub-optimises.
- One retry policy fits none. A transient OpenSearch write
failure might be retryable after 30s; a CPU-heavy thumbnailing
failure should be retried after the box is freed; an embedding-
inference failure might be an endpoint cold-start and is retryable
immediately. One
retry_count=3, backoff=exponentialdoes poorly for all three. - Blast radius on failure. A failure in stage 4 kills the stage-1 work already done. Restarting the bundled job repeats the whole sequence unless the job is carefully made idempotent at every step.
Solution: each stage is its own job. Stage completion is a stable, observable boundary. Failures retry only the stage that failed. Batching is per-stage. Back-pressure is natural (stage N's queue depth signals stage N+1's rate ceiling).
Mechanism¶
Concretely:
job_identify_frames(file_id):
frames = enumerate_indexable_frames(file_id)
for each frame: write metadata to KV, render thumbnail, upload to S3
enqueue(job_generate_embeddings, batch=frames)
return success
job_generate_embeddings(batch_of_frame_ids):
thumbnails = load_thumbnails(batch)
embeddings = sagemaker.batch_inference(thumbnails)
persist_embeddings_to_kv(batch, embeddings)
enqueue(job_persist_to_index, batch)
return success
job_persist_to_index(batch_of_frame_ids):
for each frame: write (embedding + metadata) to vector index
return success # terminal
Each stage is a separate queue consumer (SQS, SNS, Kafka, Temporal activity, etc.). Each has its own:
- Batch size. Stage 2 batches aggressively to saturate SageMaker's parallel inference; stage 3 may batch modestly to keep OpenSearch write latencies bounded.
- Retry policy. Stage 1 retries on file-IO errors with short backoff; stage 2 retries on endpoint 5xx with longer backoff; stage 3 retries on cluster-reject with exponential backoff + jitter.
- Concurrency cap. Stage 2 can be bound by a reserved-concurrency limit on the SageMaker endpoint; stage 3 by OpenSearch write throughput; stage 1 by file-IO parallelism.
Figma AI Search instance¶
Figma names this pattern directly:
"Once we've identified all of the current indexable designs in a Figma file, the frames' metadata is persisted to DynamoDB. Thumbnails are rendered and uploaded to S3. This identification and thumbnailing job queues the next job and terminates successfully. Separating the individual steps of the pipeline into discrete jobs gives us more precise control over batching and retry behavior."
The full pipeline:
- Identify + thumbnail — headless server-side C++ Figma editor
enumerates indexable frames, renders thumbnails (CPU via
llvmpipe), writes metadata to DynamoDB, uploads thumbnails to S3. Queues stage 2. - Generate embeddings — send batch of thumbnail URLs to SageMaker CLIP endpoint, persist embeddings (likely to DynamoDB, alongside metadata). Queues stage 3.
- Persist to index — write embeddings + searchable metadata to OpenSearch k-NN index. Terminal.
Notable tuning point: on the embedding step, Figma found batch-size sweet spot — "past some threshold we started to see latency growing linearly with batch size, instead of a sublinear batching effect." That tuning is a stage-2-specific concern and wouldn't be first-class in a bundled job.
Why queues, not direct calls¶
Within a stage, a direct RPC would be simpler. Between stages, the queue gives:
- Durable handoff. If stage 2 workers are briefly down, stage 1's output doesn't spill.
- Per-stage back-pressure. Queue depth gauges capacity mismatch between stages.
- Independent autoscaling. Each stage's worker fleet scales on its own queue depth.
- Replay-ability. Reprocess a single stage for a given input set without reprocessing earlier stages (useful when fixing a stage-3 bug without re-running the expensive stage-1 enumeration).
Trade-offs¶
- Latency tax. End-to-end latency is higher than a bundled job (queue-hop per stage). For online-read-path workflows this is decisive; for batch ingestion it usually doesn't matter.
- Orchestration complexity. More job classes, more queues, more failure modes to instrument.
- Distributed observability required. Tracing a single input through N queues needs correlation IDs + a logging/tracing story.
- Idempotency discipline per stage. Since retries happen per stage, each stage must be safely re-runnable. Write-once keys, conditional writes, stable external IDs.
Relationship to other patterns¶
- patterns/serverless-driver-worker — a specific infrastructure realisation (Lambda driver + SageMaker worker + SQS). The discrete-job discipline is the pipeline shape; driver-worker is the compute shape on top.
- concepts/control-plane-data-plane-separation — related split along a different axis: control/data is about which tier decides vs executes; discrete-job is about linearising an execution pipeline into retryable segments.
- patterns/selective-indexing-heuristics — often combined (the enumeration stage is where selective-indexing heuristics run).
Caveats¶
- Don't over-decompose. Splitting for the sake of splitting adds orchestration cost without benefit. The economics justify it only when the stages really have different batching / retry / scale profiles.
- Stateful mid-pipeline transforms need care. If stage 2 needs context from stage 1 that isn't in the queued message, you need a side KV (DynamoDB/Redis) and read-modify-write disciplines. Figma passes frame metadata through DynamoDB, not the queue body.
- Failed-at-stage-N dead-letters. A disciplined DLQ per stage matters — a global DLQ loses the "which stage did this die in" signal.
See also¶
- systems/figma-ai-search — the canonical four-stage instance.
- systems/aws-sqs — the typical inter-stage queue.
- patterns/serverless-driver-worker — orchestration pattern for the compute side of a discrete-job pipeline.
Seen in¶
- sources/2026-04-21-figma-the-infrastructure-behind-ai-search-in-figma — Figma explicitly names "separating the individual steps of the pipeline into discrete jobs gives us more precise control over batching and retry behavior" as the design rationale; four stages for the indexing pipeline.