Skip to content

CONCEPT Cited by 1 source

DynamoDB Streams

Definition

DynamoDB Streams is Amazon DynamoDB's built-in change-data-capture (CDC) feed. When enabled on a table, every insert / modify / delete of an item produces a corresponding stream record; the table's ordered change log becomes a first-class, AWS-managed primitive that downstream consumers (most commonly AWS Lambda functions) can subscribe to.

Shape of a stream record

Each stream record carries:

  • The table name and keys of the mutated item.
  • An old image — the item before the change (optional per stream config; omitted on pure inserts).
  • A new image — the item after the change (optional per stream config; omitted on deletes).
  • Sequence number and approximate creation timestamp.
  • Event typeINSERT, MODIFY, or REMOVE.

The stream is configurable per table: operators choose which combination of images to expose (KEYS_ONLY, NEW_IMAGE, OLD_IMAGE, NEW_AND_OLD_IMAGES).

"In our case this dataset contains the old image, containing the table item before the change, and the new image, containing the table item after the change. It can be configured which images AWS exposes to the DynamoDB stream. With both these images we are now able to assemble a corresponding Nakadi event using AWS Lambda." (Source: sources/2022-02-02-zalando-utilizing-amazon-dynamodb-and-aws-lambda-for-asynchronous-event-publication)

Why both images matter

Capturing NEW_AND_OLD_IMAGES lets the consumer do things that a simple "current state" feed can't:

  • Emit JSON Patch / diff events — the consumer computes a diff between old and new and emits a compact "what changed" record. Zalando's Lambda relay does exactly this: "the lambda function will receive the item containing the old and new image. Then it will assemble the data change event, which contains the complete item after its change as well as a patch node containing the diff."
  • Distinguish null-to-value from value-to-null. Deletes look very different from updates that nulled a field; having both images makes the distinction unambiguous.
  • Security / audit. Emitting both states gives auditors a full before/after trail without needing to cross-reference the table's current row.

The trade-off is stream storage cost — richer records consume more stream bandwidth and storage.

Integration with AWS Lambda

The canonical DynamoDB Streams consumer is Lambda: the Lambda service polls the stream (you don't write the polling code) and invokes the function with one or more records per batch. Built-in features:

  • At-least-once delivery with automatic retries on function failure.
  • Batch size and batch window controls to trade latency vs throughput.
  • Attached DLQ (SQS or SNS) for failures after retries exhaust — see patterns/sqs-dlq-plus-cron-requeue.
  • Parallelisation by shard — ordering guaranteed per item key, concurrency across keys.

Stream as outbox: the same table is both

The subtle architectural insight in Zalando's post is that the DynamoDB table and the outbox are the same storage object. There is no separate outbox table to join against, no dual-write risk, no cleanup job. The stream is an inherent byproduct of the commit — if the item mutation commits, the stream record is produced; if not, nothing is emitted.

This collapses the classic outbox architecture's "write to table + write to outbox in the same transaction" requirement into a single native operation. See patterns/dynamodb-streams-plus-lambda-outbox-relay.

Retention and semantics

  • 24-hour retention by default on the stream — consumers must keep up, or they lose records. (Kinesis Data Streams backed variants offer longer retention.)
  • Per-key ordering — records for the same partition key are delivered in commit order; across keys, no global ordering.
  • At-least-once — consumers must be idempotent; retries may replay records.

Seen in

Last updated · 550 distilled / 1,221 read