CONCEPT Cited by 1 source

Registry-less schema evolution¶

Registry-less schema evolution is the property of a streaming sink connector to evolve the downstream table's schema directly from the shape of raw incoming records — typically raw JSON — without requiring a Schema Registry as the authoritative schema source. New fields in the JSON stream become new columns in the Iceberg table; type changes propagate via the connector's inference heuristics; no manual DDL is issued by the operator.

Definition contrast¶

The default design for schema evolution on a Kafka → lakehouse path is registry-driven: producers serialise against an Avro/Protobuf/JSON-Schema contract held in a Schema Registry, and the sink connector reads the schema ID off each record, fetches the canonical schema from the registry, and applies it to the target table. Registry-driven evolution is the shape used by Redpanda Iceberg Topics (Iceberg topic mode) and by Kafka Connect-ecosystem sinks. It is strict (rejects non-compliant records) and contract-first (schema changes pass through registry compatibility checks before anything reaches the broker).

Registry-less evolution is the opposite end of the spectrum: the table schema is a reflection of the data, not a contract applied to it.

The Redpanda Iceberg output canonicalisation¶

The Iceberg output connector for Redpanda Connect (shipped 2026-03-05, v4.80.0) frames registry-less evolution as the "best of both worlds" between two named pathologies. Verbatim (Source: sources/2026-03-05-redpanda-introducing-iceberg-output-for-redpanda-connect):

"while other connectors can technically evolve a schema, doing so without a schema registry usually forces you into 'maintenance toil' (chaining brittle Kafka Connect SMTs) or leaves you with 'dirty data' (where all columns land as string data types). Redpanda Connect gives you the best of both worlds: the flexibility of raw JSON with the precision of a structured lakehouse."

The operational payoff verbatim:

"The Iceberg output also uses schema evolution to sense new fields in an incoming JSON stream and automatically updates the Iceberg table metadata. No manual DDL, no registry required, and no ticket for the ops team every time an app update adds a column."

Three failure modes the framing inverts¶

SMT-chain brittleness — Kafka Connect Single Message Transforms (SMTs) layered to extract/rename/retype fields are order-sensitive, opaquely-failing, and a known source of pipeline fragility. Registry-less evolution with type inference displaces the SMT chain.
All-string dirty-data tables — other registry-less connectors coerce every column to STRING rather than infer types, producing tables where numeric aggregation and datetime filtering require casting at query time. The Redpanda claim is to infer types, yielding a "structured lakehouse" precision.
Ops-ticket coupling — every application-side schema change (adding a column) traditionally requires a co-ordinated DDL migration on the warehouse / lakehouse side. Registry-less evolution decouples application deploys from DDL cycles.

Open questions¶

The Redpanda launch post asserts the capability but does not disclose mechanism depth:

Type inference strategy — is it first-record-wins, most-common-type, type-widening across a batch, or something else?
Type-conflict handling — what happens when record N claims field x is an integer and record N+1 claims it's a string? Coerce? Quarantine? Error?
Column rename / widening — JSON field renames are structurally indistinguishable from delete-plus-add from the connector's vantage point; how are column renames (if at all) detected?
Deleted fields — JSON absence is ambiguous (absent vs null vs intentionally removed). Are dropped fields left as nullable columns, tombstoned, or removed?
Interaction with downstream readers — query engines (Trino, Spark, Snowflake) cache Iceberg schema snapshots; aggressive evolution cadence can invalidate caches and force re-plans.

These questions map onto the boundary between "flexible" and "dirty" — the wiki captures them as the mechanism gap to track in follow-up posts.

Relationship to `schema-evolution` concept¶

This is a fifth axis on the wiki's schema-evolution concept, joining:

Async-CDC multi-clock skew (Datadog 2025-11-04).
Protobuf design discipline (Lyft 2024-09-16).
Registry-driven Iceberg Topics (Redpanda 25.1 GA).
Schema-evolution off as time-series performance optimisation on Snowflake (Redpanda 2025-12-09).
Registry-less data-driven evolution (this page).

Seen in¶

Redpanda — Introducing Iceberg output for Redpanda Connect (2026-03-05) — canonical wiki introduction. Iceberg output connector for Redpanda Connect uses registry-less data-driven evolution as its default schema shape, positioned as the "best of both worlds" inversion of SMT-chain brittleness + all-string dirty-data tables.

concepts/schema-evolution — parent concept
concepts/schema-registry — registry-driven counterpart
concepts/iceberg-catalog-rest-sync
systems/redpanda-connect-iceberg-output — canonical instance
systems/redpanda-iceberg-topics — registry-driven counterpart at broker altitude
systems/apache-iceberg
patterns/streaming-broker-as-lakehouse-bronze-sink