Skip to content

CONCEPT Cited by 4 sources

Schema registry

A schema registry is a centralized, versioned store of data contracts — typically event / message / record schemas — used as the single source of truth for the shape, type, and semantics of data flowing across service boundaries. It turns implicit, per-team contracts into explicit, discoverable, and auditable artifacts.

Core responsibilities

  • Single source of truth for event/message definitions across teams.
  • Versioning — schemas evolve; the registry retains previous versions; publishers and subscribers negotiate which version they produce/consume.
  • Validation — publishers can validate outbound events; subscribers can validate inbound events.
  • Compatibility enforcement — backward / forward / full rules that gate schema changes at PR / build time (prevent silent contract breaks).
  • Deprecation paths — structured removal of fields / events, with lead times communicated to consumers.
  • Discovery — a browsable catalog of every event type + its publishers + its subscribers (the "who produces what, who consumes what" map that ad-hoc pub/sub systems famously lack).
  • Audit trails — every schema change attributed, reviewed, and reversible.

Registry vs validation — a load-bearing distinction

"EventBridge provides developers with tools to implement validation using external solutions or custom application code, it currently does not include native schema validation capabilities." — Amazon Key team, 2026-02-04 (sources/2026-02-04-aws-amazon-key-eventbridge-event-driven-architecture)

A schema registry that stores schemas and a schema validation layer that enforces them are separable capabilities. AWS EventBridge provides the former (+ schema discovery from live traffic) but not the latter; for teams with strict validation requirements this forces a build-on-top choice between a centralized validation service and patterns/client-side-schema-validation. The Amazon Key team explicitly chose client-side validation after evaluating both — the centralized option would have added a network hop + its own scaling problem.

Design axes

  • Format: JSON Schema (Draft-04 used by Amazon Key's registry), Avro (Confluent Schema Registry canonical), Protobuf, OpenAPI.
  • Storage: dedicated microservice vs built into the event-bus control plane.
  • Code generation: runtime lookups vs build-time code bindings. Build-time bindings give type-safe event constructors + publish/ subscribe interfaces at the developer ergonomics level.
  • Governance model: self-service schema PRs vs gatekept by a central team.
  • Integration surface: IDE plugin, CLI, CI hook, runtime library.

Why "loose schemas" is an organisational cost

Without a schema registry, event contracts exist only in the consumer code that parses them. Consequences:

  • Breaking changes "almost impossible to implement" safely — publishers can't know whether a consumer relies on a field.
  • No collaboration surface for schema modifications across teams.
  • No place for publishers to discover whether an event is valid before it hits the bus.
  • Semantic context (inheritance, composition, required-vs-optional) is lost; every consumer re-infers it.

This is exactly the gap Amazon Key's custom repository was built to close.

Seen in

  • sources/2026-02-04-aws-amazon-key-eventbridge-event-driven-architecture — Amazon Key built a custom schema repository (JSON Schema Draft-04) alongside EventBridge because native validation is absent. Code bindings generated at build time; client library consumes schemas for pre-publish validation + serde. New-event onboarding time dropped 48h → 4h.
  • sources/2025-06-14-netflix-model-once-represent-everywhere-uda — Netflix UDA unifies schema registry and data catalog into a single knowledge-graph substrate. "We needed a data catalog unified with a schema registry, but with a hard requirement for semantic integration." Schemas (GraphQL / Avro / SQL / RDF / Java) are transpiled from upstream domain models (patterns/schema-transpilation-from-domain-model), not hand-authored per surface. Because the upstream domain model is the single source of truth (patterns/model-once-represent-everywhere), the registry gains semantic integration as a property, not just schema-shape agreement.
  • sources/2025-06-24-redpanda-why-streaming-is-the-backbone-for-ai-native-data-platformsSchema registry as CI/CD artefact, not runtime afterthought. Redpanda's backbone essay positions the streaming-context registry as the API contract between teams, equivalent to the HTTP API contract for synchronous services. Verbatim: "Hooking up schema changes and publications as part of your CI/CD pipelines and infrastructure-as-code (IaC) can also help catch issues in your engineering teams earlier during development, rather than in staging or production environments." The implication is that schema evolution becomes a PR-reviewable, IaC-owned artefact rather than an ops-coordinated migration. Complements the existing registry-vs-validation framing on this page by adding a deploy-time layer on top of the registry's storage + the build-time validation hook.
  • sources/2026-03-31-redpanda-261-delivers-the-industrys-first-adaptable-streaming-engine — Redpanda 26.1 launch post. Introduces two schema-registry extensions that turn a version-tracking registry into a governance substrate: (1) Schema Registry contexts (concepts/schema-registry-context) — "Contexts allow you to namespace your schemas, making it easy to isolate environments, perform complex migrations, and manage multi-team registries." Three canonical use cases: environment isolation (dev/staging/prod), complex migrations (old-context + new-context co-existing), multi-team registries (one physical registry, many logical namespaces). (2) Custom schema metadata"You can now attach arbitrary metadata properties to your schemas, turning Redpanda Schema Registry into a first-class citizen in your data governance and observability stack." Annotation axis for owner, SLA, sensitivity classification, lineage tags. Together they extend the registry from pure schema-version-tracking into a queryable data-catalog-adjacent substrate.
Last updated · 542 distilled / 1,571 read