Skip to content

PATTERN Cited by 1 source

Connector transformations + enrichment API

Summary

Expose two distinct per-tenant customisation surfaces on a shared managed replication platform — one at the transport layer (Kafka Connect single-message transforms + custom connector forks) and one at the storage / enrichment layer (a standardised enrichment API). Teams reshape and enrich their own data without forking the underlying pipeline or the sink.

Problem

A one-size-fits-all pipeline doesn't scale well across many internal tenants with diverging needs:

"Some teams needed filtered or denormalized data, others required custom enrichment logic, and several needed to manipulate the shape or structure of their data prior to storage." (Source: sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform)

Naïve responses fail:

  • Fork the pipeline per tenant → combinatorial operational cost, no shared improvements.
  • Push all customisation to the source → forces source teams to carry consumer-specific concerns; breaks async decoupling.
  • Push all customisation to the sink destination → each destination system re-implements the same enrichment logic (timestamp normalisation, derived IDs, metadata tags).

Solution

Split customisation across two surfaces, aligned to where the concern naturally lives:

Surface 1 — Kafka Connect single-message transforms (SMTs)

At the transport layer, between the Debezium source and the sink destination, the connector runs a chain of single-message transforms that operate on individual records. Datadog's named transforms:

Transform Scope
Dynamic topic renaming Rewrite destination topic per record
Column type conversion Change field types inline
Composite primary key generation Concatenate fields into a single key
Add column Inject derived / static columns
Drop column Remove unwanted columns before sink

When OSS SMTs don't cover a need, the platform owns custom forks of the connector to introduce specialised logic:

"Where out-of-the-box connector customizations fell short, we maintained custom forks to introduce Datadog-specific logic and optimizations." (Source: sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform)

Surface 2 — standardised enrichment API

At the storage / post-ingestion layer, a centralised enrichment API sits on top of the destination system (in Datadog's case, the search platform), providing a uniform way to request derived fields or metadata during or after ingestion.

"For teams that needed additional derived fields or metadata appended to their records, we introduced a custom enrichment API. This service sits as a layer on top of our search platform, providing a standardized way to request enrichments during or after ingestion. By centralizing enrichment logic, we avoided duplicating it across individual pipelines." (Source: sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform)

Why two surfaces, not one

The transport layer and the storage layer have different operational shapes:

Surface Works on Latency budget State Code ownership
SMT (transport) Per record in flight Streaming (ms) Stateless per record Connector config + occasional fork
Enrichment API (storage) After ingestion or on top of storage Higher Shared services, databases, derived state Platform-owned API with per-team endpoints

Record-level shape tweaks (rename, retype, drop columns) belong in SMTs because they need no outside state and must not bottleneck throughput. Enrichment that requires a lookup or a shared derivation (tenant-id join, organisation-level metadata, shared categorisation) belongs in the enrichment API because a shared service amortises the derivation.

Benefits

  • Per-tenant customisation without forking the pipeline — both surfaces are configured, not branched.
  • Shared derivation logic stays shared — the enrichment API is the anti-duplication control point.
  • Source stays clean — no consumer-specific logic leaks into the producer.
  • Sink stays uniform — a single sink destination sees canonicalised records across tenants.

Caveats

  • Custom connector forks are technical debt. Datadog accepts this as the "where OSS falls short" escape hatch; the general pattern doesn't prescribe when to fork vs. when to adapt OSS upstream.
  • Enrichment API is a shared service — its availability budget must match the downstream ingestion budget, and its latency is on the ingestion path.
  • Two surfaces means two places to look when debugging a record's final shape. Observability across both is required.
  • SMTs can chain arbitrarily — a long chain hurts throughput and readability; teams should keep chains short.

Relation to other platform patterns

This pattern is the per-tenant customisation member of the five-pattern bundle that composes a patterns/managed-replication-platform:

The enrichment API's "centralise to avoid duplication" discipline is shape-analogous to central-proxy-choke-point patterns in other platforms (e.g. AI gateways): share the logic once, serve many clients.

Seen in

Last updated · 200 distilled / 1,178 read