PATTERN Cited by 1 source
Connector transformations + enrichment API¶
Summary¶
Expose two distinct per-tenant customisation surfaces on a shared managed replication platform — one at the transport layer (Kafka Connect single-message transforms + custom connector forks) and one at the storage / enrichment layer (a standardised enrichment API). Teams reshape and enrich their own data without forking the underlying pipeline or the sink.
Problem¶
A one-size-fits-all pipeline doesn't scale well across many internal tenants with diverging needs:
"Some teams needed filtered or denormalized data, others required custom enrichment logic, and several needed to manipulate the shape or structure of their data prior to storage." (Source: sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform)
Naïve responses fail:
- Fork the pipeline per tenant → combinatorial operational cost, no shared improvements.
- Push all customisation to the source → forces source teams to carry consumer-specific concerns; breaks async decoupling.
- Push all customisation to the sink destination → each destination system re-implements the same enrichment logic (timestamp normalisation, derived IDs, metadata tags).
Solution¶
Split customisation across two surfaces, aligned to where the concern naturally lives:
Surface 1 — Kafka Connect single-message transforms (SMTs)¶
At the transport layer, between the Debezium source and the sink destination, the connector runs a chain of single-message transforms that operate on individual records. Datadog's named transforms:
| Transform | Scope |
|---|---|
| Dynamic topic renaming | Rewrite destination topic per record |
| Column type conversion | Change field types inline |
| Composite primary key generation | Concatenate fields into a single key |
| Add column | Inject derived / static columns |
| Drop column | Remove unwanted columns before sink |
When OSS SMTs don't cover a need, the platform owns custom forks of the connector to introduce specialised logic:
"Where out-of-the-box connector customizations fell short, we maintained custom forks to introduce Datadog-specific logic and optimizations." (Source: sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform)
Surface 2 — standardised enrichment API¶
At the storage / post-ingestion layer, a centralised enrichment API sits on top of the destination system (in Datadog's case, the search platform), providing a uniform way to request derived fields or metadata during or after ingestion.
"For teams that needed additional derived fields or metadata appended to their records, we introduced a custom enrichment API. This service sits as a layer on top of our search platform, providing a standardized way to request enrichments during or after ingestion. By centralizing enrichment logic, we avoided duplicating it across individual pipelines." (Source: sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform)
Why two surfaces, not one¶
The transport layer and the storage layer have different operational shapes:
| Surface | Works on | Latency budget | State | Code ownership |
|---|---|---|---|---|
| SMT (transport) | Per record in flight | Streaming (ms) | Stateless per record | Connector config + occasional fork |
| Enrichment API (storage) | After ingestion or on top of storage | Higher | Shared services, databases, derived state | Platform-owned API with per-team endpoints |
Record-level shape tweaks (rename, retype, drop columns) belong in SMTs because they need no outside state and must not bottleneck throughput. Enrichment that requires a lookup or a shared derivation (tenant-id join, organisation-level metadata, shared categorisation) belongs in the enrichment API because a shared service amortises the derivation.
Benefits¶
- Per-tenant customisation without forking the pipeline — both surfaces are configured, not branched.
- Shared derivation logic stays shared — the enrichment API is the anti-duplication control point.
- Source stays clean — no consumer-specific logic leaks into the producer.
- Sink stays uniform — a single sink destination sees canonicalised records across tenants.
Caveats¶
- Custom connector forks are technical debt. Datadog accepts this as the "where OSS falls short" escape hatch; the general pattern doesn't prescribe when to fork vs. when to adapt OSS upstream.
- Enrichment API is a shared service — its availability budget must match the downstream ingestion budget, and its latency is on the ingestion path.
- Two surfaces means two places to look when debugging a record's final shape. Observability across both is required.
- SMTs can chain arbitrarily — a long chain hurts throughput and readability; teams should keep chains short.
Relation to other platform patterns¶
This pattern is the per-tenant customisation member of the five-pattern bundle that composes a patterns/managed-replication-platform:
- patterns/debezium-kafka-connect-cdc-pipeline — transport backbone (provides the SMT hook point).
- patterns/workflow-orchestrated-pipeline-provisioning — provisioning automation.
- patterns/schema-validation-before-deploy — offline schema gate.
- patterns/schema-registry-backward-compat — runtime schema gate.
- This pattern — per-tenant customisation.
The enrichment API's "centralise to avoid duplication" discipline is shape-analogous to central-proxy-choke-point patterns in other platforms (e.g. AI gateways): share the logic once, serve many clients.
Seen in¶
- sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform — Datadog exposes Kafka Connect single-message transforms at the transport layer (with custom forks as escape hatch) and a standardised enrichment API at the storage layer, together as the per-tenant customisation answer on their managed CDC replication platform.
Related¶
- systems/kafka-connect — home of the SMT surface.
- systems/debezium — the source connector half of the backbone.
- patterns/managed-replication-platform — the umbrella pattern this is a component of.
- patterns/debezium-kafka-connect-cdc-pipeline — the transport backbone.
- patterns/central-proxy-choke-point — shape-analogous "centralise to avoid duplication" pattern in the AI-gateway space.