SYSTEM Cited by 4 sources
AWS SQS¶
Amazon SQS (Simple Queue Service) is AWS's managed message queue: durable, at-least-once delivery, essentially unlimited scale, with standard and FIFO variants. In system-design terms, it's the generic durable work-dispatch primitive between producers and consumers that can operate at different rates and need the queue to survive either side crashing.
Typical role in data pipelines¶
SQS commonly appears as the durability layer in warehouse-unload bridges: the warehouse exports into S3, an S3-event emits an SQS message, and the ingester consumes the queue and writes to the serving store. If the ingester crashes or the serving store throttles, the message stays in the queue; when the ingester comes back up, it resumes without data loss.
Canva uses exactly this shape in the counting pipeline: Snowflake unload → S3 → SQS → rate-limited ingester → service RDS. They call out SQS's durability specifically as the reason exported data "doesn't get lost". (Source: sources/2024-04-29-canva-scaling-to-count-billions; patterns/warehouse-unload-bridge)
Operational notes¶
- At-least-once means consumers must be idempotent (outer-join upserts are one way to get there — see patterns/end-to-end-recompute).
- Visibility timeout has to be tuned against downstream write latency; too low and messages double-deliver, too high and failed messages sit idle.
- DLQs for poison-pill messages are a must in any serious pipeline.
Seen in¶
- sources/2024-04-29-canva-scaling-to-count-billions — SQS between S3 and the RDS ingester in Canva's warehouse-unload bridge, providing durability for warehouse→OLTP export.
- sources/2024-07-29-aws-amazons-exabyte-scale-migration-from-apache-spark-to-ray-on-ec2 — SQS as one of the primitives in Amazon Retail BDT's 2021 serverless Ray job-management substrate (alongside systems/dynamodb, systems/aws-sns, systems/aws-s3) for durable job-lifecycle tracking across thousands of exabyte-scale Ray compaction jobs per day.
- sources/2026-02-04-aws-amazon-key-eventbridge-event-driven-architecture — Named (with SNS) in the "ad-hoc SNS/SQS pairs" anti-pattern that Amazon Key's EventBridge migration replaced. SQS still the natural per-subscriber queue under an EventBridge target, just not the shared-bus abstraction.
- — SQS as the attached dead-letter queue of a Lambda outbox relay. Zalando Payments's Order Store publishes events to Nakadi via a Lambda triggered by DynamoDB Streams; when the Lambda's exponential-backoff retries exhaust, the event lands in the Lambda's built-in SQS DLQ. A Kubernetes CronJob runs the same Python publication code on an interval, draining the DLQ until Nakadi accepts — the same-code-on-two-substrates property is load-bearing. Canonical wiki example of SQS DLQ + external cron re-drain as the fallback tier of an event-publish pipeline. See patterns/sqs-dlq-plus-cron-requeue, patterns/dynamodb-streams-plus-lambda-outbox-relay.
- sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph — SQS (paired with SNS + Kafka) as the per-subscriber durability layer for Netflix MDS's ingestion of thin notification-of-change events from six source systems. SNS fans out source-system events; SQS provides per-consumer durability so MDS can absorb ingestion bursts without dropping events while the enrichment workers are hydrating from source APIs at a rate-limited cadence.
Related¶
- systems/aws-s3, systems/aws-rds, systems/aws-sns
- systems/amazon-eventbridge — the org-scale event-bus abstraction; SQS often used as an EventBridge target for per-subscriber durability.
- patterns/warehouse-unload-bridge
- patterns/single-bus-multi-account — the deployment topology where SQS still shines as the per-subscriber durability layer.