PATTERN Cited by 1 source

Connector transformations + enrichment API¶

Summary¶

Expose two distinct per-tenant customisation surfaces on a shared managed replication platform — one at the transport layer (Kafka Connect single-message transforms + custom connector forks) and one at the storage / enrichment layer (a standardised enrichment API). Teams reshape and enrich their own data without forking the underlying pipeline or the sink.

Problem¶

A one-size-fits-all pipeline doesn't scale well across many internal tenants with diverging needs:

"Some teams needed filtered or denormalized data, others required custom enrichment logic, and several needed to manipulate the shape or structure of their data prior to storage." (Source: sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform)

Naïve responses fail:

Fork the pipeline per tenant → combinatorial operational cost, no shared improvements.
Push all customisation to the source → forces source teams to carry consumer-specific concerns; breaks async decoupling.
Push all customisation to the sink destination → each destination system re-implements the same enrichment logic (timestamp normalisation, derived IDs, metadata tags).

Solution¶

Split customisation across two surfaces, aligned to where the concern naturally lives:

Surface 1 — Kafka Connect single-message transforms (SMTs)¶

At the transport layer, between the Debezium source and the sink destination, the connector runs a chain of single-message transforms that operate on individual records. Datadog's named transforms:

Transform	Scope
Dynamic topic renaming	Rewrite destination topic per record
Column type conversion	Change field types inline
Composite primary key generation	Concatenate fields into a single key
Add column	Inject derived / static columns
Drop column	Remove unwanted columns before sink

When OSS SMTs don't cover a need, the platform owns custom forks of the connector to introduce specialised logic:

"Where out-of-the-box connector customizations fell short, we maintained custom forks to introduce Datadog-specific logic and optimizations." (Source: sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform)

Surface 2 — standardised enrichment API¶

At the storage / post-ingestion layer, a centralised enrichment API sits on top of the destination system (in Datadog's case, the search platform), providing a uniform way to request derived fields or metadata during or after ingestion.

"For teams that needed additional derived fields or metadata appended to their records, we introduced a custom enrichment API. This service sits as a layer on top of our search platform, providing a standardized way to request enrichments during or after ingestion. By centralizing enrichment logic, we avoided duplicating it across individual pipelines." (Source: sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform)

Why two surfaces, not one¶

The transport layer and the storage layer have different operational shapes:

Surface	Works on	Latency budget	State	Code ownership
SMT (transport)	Per record in flight	Streaming (ms)	Stateless per record	Connector config + occasional fork
Enrichment API (storage)	After ingestion or on top of storage	Higher	Shared services, databases, derived state	Platform-owned API with per-team endpoints

Record-level shape tweaks (rename, retype, drop columns) belong in SMTs because they need no outside state and must not bottleneck throughput. Enrichment that requires a lookup or a shared derivation (tenant-id join, organisation-level metadata, shared categorisation) belongs in the enrichment API because a shared service amortises the derivation.

Benefits¶

Per-tenant customisation without forking the pipeline — both surfaces are configured, not branched.
Shared derivation logic stays shared — the enrichment API is the anti-duplication control point.
Source stays clean — no consumer-specific logic leaks into the producer.
Sink stays uniform — a single sink destination sees canonicalised records across tenants.

Caveats¶

Custom connector forks are technical debt. Datadog accepts this as the "where OSS falls short" escape hatch; the general pattern doesn't prescribe when to fork vs. when to adapt OSS upstream.
Enrichment API is a shared service — its availability budget must match the downstream ingestion budget, and its latency is on the ingestion path.
Two surfaces means two places to look when debugging a record's final shape. Observability across both is required.
SMTs can chain arbitrarily — a long chain hurts throughput and readability; teams should keep chains short.

Relation to other platform patterns¶

This pattern is the per-tenant customisation member of the five-pattern bundle that composes a patterns/managed-replication-platform:

patterns/debezium-kafka-connect-cdc-pipeline — transport backbone (provides the SMT hook point).
patterns/workflow-orchestrated-pipeline-provisioning — provisioning automation.
patterns/schema-validation-before-deploy — offline schema gate.
patterns/schema-registry-backward-compat — runtime schema gate.
This pattern — per-tenant customisation.

The enrichment API's "centralise to avoid duplication" discipline is shape-analogous to central-proxy-choke-point patterns in other platforms (e.g. AI gateways): share the logic once, serve many clients.

Seen in¶

sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform — Datadog exposes Kafka Connect single-message transforms at the transport layer (with custom forks as escape hatch) and a standardised enrichment API at the storage layer, together as the per-tenant customisation answer on their managed CDC replication platform.

systems/kafka-connect — home of the SMT surface.
systems/debezium — the source connector half of the backbone.
patterns/managed-replication-platform — the umbrella pattern this is a component of.
patterns/debezium-kafka-connect-cdc-pipeline — the transport backbone.
patterns/central-proxy-choke-point — shape-analogous "centralise to avoid duplication" pattern in the AI-gateway space.