PATTERN Cited by 1 source

Dead-letter queue for invalid records¶

Dead-letter queue for invalid records is a validation pattern where a data pipeline's producer-side validator redirects records that fail validation to a separate queue — the dead-letter queue (DLQ) — instead of dropping them or failing the batch. The DLQ becomes an out-of-band stream of rejected records available for offline re-processing, schema-evolution-aware replay, or root- cause investigation.

Source: sources/2025-04-07-redpanda-251-iceberg-topics-now-generally-available canonicalises this as a broker-level feature on Redpanda Iceberg Topics at GA:

"Built-in dead-letter queues to redirect and re-process invalid records, improving data quality, reliability, and end-user trust in data." (Source)

Problem this solves¶

A streaming pipeline that projects records into a schema-enforced downstream format (typed Iceberg table, relational database, typed Parquet files) must decide what to do with records that violate the schema:

Record is malformed — wrong type for a column, unparseable bytes, truncated payload.
Schema mismatch — the producer is still writing v1 records but the downstream schema is v2 with incompatible changes.
Value constraint violation — record's values violate a check constraint (negative ID, out-of-range timestamp).

Three naive strategies all have structural problems:

Drop the bad record silently. Loses data; no signal to operators; schema bug propagates unnoticed.
Fail the entire batch. Blocks the good records behind the bad one; production incident for a transient data-quality issue.
Auto-fix (coerce types, null out unknown fields). Corrupts the downstream data model in a way that's impossible to recover from (you can't distinguish "value was null in source" from "value was coerced from garbage").

The DLQ answer¶

Route the bad record to a separate queue. The good records continue flowing; the bad records accumulate in a dedicated place where operators can:

Inspect them to understand the data-quality issue (is it one buggy producer, a schema-version skew, or a semantic bug?).
Replay them after fixing the schema or the producer (many DLQ messages are recoverable once a downstream schema update lands).
Quantify the problem via DLQ-depth / DLQ-rate metrics — a feedback signal that the upstream schema contract is being violated.

Where the validator lives¶

There are three structural places to put the validator:

Producer-side — the producing application validates before writing to the topic. Pros: catches bad data before it touches the broker. Cons: duplicated across every producer; no broker-level data-quality guarantee.
Consumer-side — each consumer validates as it reads. Pros: no broker changes. Cons: every consumer reimplements validation; bad data lives in the topic forever.
Broker-side / platform-managed — the broker validates on write using a schema registry or schema definition (e.g. the Iceberg table schema). Pros: single point of enforcement; DLQ is a platform primitive. Cons: broker does more work; schema must be known to the broker.

The Redpanda 25.1 Iceberg Topics case is the third shape — broker validates against the Iceberg table schema during the row-to- Parquet projection, and the DLQ is a standard Kafka topic the operator can configure.

DLQ as a first-class operational surface¶

A well-designed DLQ pattern includes:

Replay tooling — one-click re-publish from DLQ back to primary topic after root cause is fixed.
DLQ-depth monitoring — alert when DLQ grows unexpectedly fast (signal of upstream breakage).
DLQ TTL / retention — bounded retention so DLQ doesn't grow unboundedly; retention must be longer than the schema-evolution / upstream-fix SLA.
Error metadata — each DLQ record carries the rejection reason (field name, schema version, specific constraint) so operators can triage without re-validating.

The Redpanda post asserts the DLQ primitive but does not specify the metadata shape, retention default, or replay tooling — operational specifics deferred to the product documentation.

Trade-offs¶

DLQ itself has a schema problem. DLQ records need some schema for tooling to read them — typically an envelope schema (original payload bytes + rejection reason + source topic + original offset). Envelope schemas evolve too, at which point the DLQ's DLQ becomes a real question.
DLQ depth is a lagging indicator. By the time operators notice a bad producer via DLQ depth alerts, many bad records may already be rejected. For contract-breaking schema changes upstream, stage-gated deploys + schema registry enforcement (prevention) is stronger than DLQ (detection).
Replay ordering hazard. Records replayed from DLQ after a schema fix re-enter the topic at a later offset than the original — downstream consumers that care about record ordering within a partition will see the replayed records as "arrived late", which may violate application-level invariants.

Seen in¶

sources/2025-04-07-redpanda-251-iceberg-topics-now-generally-available — canonical wiki source. Built-in DLQ for records that fail schema validation during Iceberg Topic projection is one of four table-management GA features on 25.1 (Redpanda Iceberg Topics), alongside custom hierarchical bucketed partitioning, schema evolution, and snapshot expiry.

systems/redpanda-iceberg-topics — canonical instantiation: broker-level validation during row-to-Parquet projection with DLQ redirect.
systems/kafka — the protocol DLQs in streaming pipelines most commonly live on (the DLQ is a normal Kafka topic).
concepts/iceberg-topic · concepts/schema-evolution — architectural context in which broker-level DLQ makes sense.
patterns/streaming-broker-as-lakehouse-bronze-sink — the broader pattern a DLQ protects (data-quality boundary between a streaming topic and a downstream typed Iceberg table).