PATTERN Cited by 1 source
DynamoDB Streams + Lambda outbox relay¶
Definition¶
The DynamoDB Streams + Lambda outbox relay pattern is the DynamoDB-native realisation of transactional outbox: the service writes only to its DynamoDB table in the synchronous path; the table itself is the outbox via DynamoDB Streams; an AWS Lambda function subscribed to the stream acts as the message relay, assembling and publishing each change as a domain event on the target message bus (Zalando publishes to Nakadi).
There is no dedicated outbox table, no polling worker, and no dual-write risk in the service — the table + stream are the entire outbox mechanism.
Shape¶
REST API DynamoDB DynamoDB Lambda
│ table Stream (relay)
│─ PUT item ───────────▶ │ │
│ │── tx commit ─▶ │ │
│◀─────── 200 OK ─────── │ │ │
│ │── stream record ──▶ │
│ │ (OLD + NEW image) │
│ │ │── publish event ──▶ Message bus
- Sync path: one downstream (DynamoDB). Availability ceiling is
A(DynamoDB), notA(DynamoDB) × A(bus). - Async path: DynamoDB Stream record → Lambda invocation → event published.
- Failure path: Lambda retry ladder → SQS DLQ → cron re-drain (see patterns/sqs-dlq-plus-cron-requeue).
Why the table is the outbox¶
The outbox pattern's historical pain point is the "two writes to the same DB in one transaction" requirement: you update the primary row and insert an outbox row atomically. With a generic RDBMS this is fine (both are writes to the same local DB); the cost is the extra write + the cleanup job.
With DynamoDB Streams, the stream is an inherent byproduct of the item mutation. There's no second write — the commit of the item IS the commit of the outbox entry. Properties:
- Zero dual-write risk. There is no separate outbox table to keep in sync. Whatever you write to the primary table is what the stream emits.
- No cleanup. Stream records are retained for 24 hours by default, then discarded; no growing outbox table to vacuum.
- No polling. Lambda's stream integration handles the poll loop on AWS's side.
- Ordering per key. DynamoDB Streams delivers records for the same partition key in commit order — relay events per-key are stable.
Event assembly: use both images¶
The stream must be configured with NEW_AND_OLD_IMAGES so the
Lambda can produce a rich domain event. Zalando's relay emits
events containing:
- The full post-change item (NEW_IMAGE) — so downstream consumers see the complete record without needing to fetch it.
- A JSON patch diff computed from the OLD and NEW images — so consumers interested only in "what changed" don't have to compute it themselves.
"The lambda function will receive the item containing the old and new image. Then it will assemble the data change event, which contains the complete item after its change as well as a patch node containing the diff. As a last step the assembled event is published to Nakadi." (Source: sources/2022-02-02-zalando-utilizing-amazon-dynamodb-and-aws-lambda-for-asynchronous-event-publication)
Language / runtime choice: Python¶
Zalando chose Python for the Lambda relay, citing lightweight- ness vs Java. The implicit drivers:
- Lower cold-start on serverless runtimes than JVM languages.
- Smaller deployment bundle — less to ship when the only logic is "parse record, assemble event, POST".
- Fast iteration for what's effectively an event-translation shim.
The reuse benefit is also real: the same Python module runs in Lambda for the stream-driven path and in the Kubernetes CronJob for the DLQ re-drain path — write once, run on both substrates. See patterns/sqs-dlq-plus-cron-requeue.
Trade-offs¶
- 24-hour stream retention. If Lambda falls behind for >24h, records are lost. Monitoring iterator age is mandatory in production.
- Per-invocation Lambda cost. At high write volumes, relay fees are a continuous cost. For workloads above a certain throughput, a long-running consumer (EC2 / ECS) may be cheaper.
- No cross-key ordering. Records across different partition keys have no global order; consumers needing global order must embed sequence numbers in the payload.
- At-least-once with DLQ requeue. Retries + DLQ requeue produce duplicates and reordering. See concepts/at-least-once-delivery.
- Event schema coupling to DynamoDB item shape. The relay implicitly couples the item schema to the event schema; item schema changes need coordinated relay + consumer changes.
- No batching across items. Each stream record → one event; batching events for efficiency requires a separate aggregation layer.
When to use¶
- Primary data store is DynamoDB.
- Every item mutation should produce a domain event.
- Sync path availability matters — the service is client-facing.
- Downstream consumers tolerate eventual, at-least-once, possibly-reordered delivery.
When NOT to use¶
- Primary store is not DynamoDB. Use the same pattern with whatever CDC primitive the store offers (Postgres logical replication via Debezium, MySQL binlog, MongoDB change streams, etc.). Zalando ships a Postgres-specific variant managed by a Kubernetes operator.
- Ultra-low latency event delivery required (sub-100ms reliably). Lambda cold starts and stream-batch windows add variance.
- High write throughput where Lambda fees dominate cost. Switch to a continuous consumer process.
- Strong global ordering required. The pattern cannot provide it without additional sequencing infrastructure.
Seen in¶
- sources/2022-02-02-zalando-utilizing-amazon-dynamodb-and-aws-lambda-for-asynchronous-event-publication — Canonical worked example. Zalando Payments's Order Store stores payment-related order data in a DynamoDB table and uses DynamoDB Streams + a Python AWS Lambda function as the outbox relay to publish events to Nakadi (Zalando's Kafka-backed event bus). The relay assembles a data-change event containing both the full post-change item and a JSON patch diff between old and new images. Motivated by the 99.9% × 99.9% = 99.8% availability arithmetic of the prior design that called DynamoDB and Nakadi synchronously in the same request.
Related¶
- patterns/transactional-outbox — the general pattern this is a realisation of.
- patterns/sqs-dlq-plus-cron-requeue — the fallback layer typical when the relay publishes to a best-effort bus.
- concepts/dynamodb-streams — the CDC primitive.
- concepts/change-data-capture — the generic category.
- concepts/availability-multiplication-of-dependencies — the motivation.
- concepts/event-driven-architecture — the aggregate shape.
- systems/dynamodb, systems/aws-lambda, systems/nakadi
- companies/zalando