CONCEPT Cited by 1 source
Precision-aware type mapping¶
Definition¶
Precision-aware type mapping is the schema-translation
discipline where a CDC or data-ingestion connector inspects the
source database's precision and scale metadata on numeric
columns and emits different downstream types depending on
whether the column is an integer (scale = 0) or a decimal
(scale > 0), rather than collapsing every numeric column to a
single wire-format type (often string or generic decimal)
and losing type information at the downstream consumer.
Canonical verbatim from the Redpanda Connect oracledb_cdc
launch:
"It queries Oracle's
ALL_TAB_COLUMNScatalog and attaches a full column schema to each message as metadata, with precision-awareNUMBERmapping (integers asint64, decimals asjson.Number)." (Source: sources/2026-04-09-redpanda-oracle-cdc-now-available-in-redpanda-connect)
Why Oracle specifically needs this¶
Oracle's NUMBER type is uniquely ambiguous among mainstream
RDBMS numeric types: a single NUMBER(p, s) declaration can
represent any numeric shape from an 18-digit integer to a high-
precision decimal. Concrete shapes:
NUMBER(10, 0)— a 10-digit integer (fits inint64).NUMBER(10, 2)— a decimal with 2 digits of scale (e.g. price-column use cases).NUMBERwithout precision — variable-precision arithmetic.
Naive CDC connectors collapse all NUMBER columns to a single
wire type (often string or a generic decimal), forcing the
downstream consumer to guess whether each column is
integer-shaped or decimal-shaped. Precision-aware mapping asks
the data dictionary (ALL_TAB_COLUMNS for Oracle) what the
column actually is, and emits the specific wire type that
preserves the source semantics:
NUMBER(p, 0)→int64(integer, typed, fits numeric ops).NUMBER(p, s)withs > 0→json.Number(decimal, preserves precision without float rounding).
Why this matters to downstream consumers¶
Without precision-aware mapping, every downstream consumer of a CDC topic carries the burden:
- Type mismatches surface days later in production. A
column the application treats as an integer was emitted as a
string by the CDC connector; the Avro schema pinned it as
string; the downstream analytics table is nowVARCHARinstead ofBIGINT. - Schema Registry evolution gets harder. Upgrading a column from imprecise to precise type crosses a compatibility break in the Schema Registry's backward-compatibility rules — requires a coordinated downstream redeploy.
Precision-aware mapping pushes the type-fidelity problem
upstream to the CDC connector, which has direct access to the
source schema via the data dictionary. This is one of the
architectural benefits of the 2026-04-09 Redpanda oracledb_cdc
launch: "Your consumers get typed, schema-tracked events from
day one."
Composition with Schema Registry¶
The 2026-04-09 post's canonical pipeline pairs precision-aware type mapping with Schema Registry for the downstream wire-format encoding:
pipeline:
processors:
- schema_registry_encode:
url: http://schema-registry:8081
subject: ${! meta("table_name") }
The oracledb_cdc input attaches the precision-aware schema
metadata to each message; the schema_registry_encode processor
reads that schema, registers it in the Schema Registry, and
encodes the payload as Avro. The downstream consumer receives
typed Avro records with int64 and decimal types preserved,
not a bag of strings.
Generalisation beyond Oracle¶
While canonicalised here on the Oracle NUMBER type because
that's where the precision ambiguity is most acute, the pattern
generalises to any source with type ambiguity:
- MySQL —
DECIMAL(10, 0)vsDECIMAL(10, 2)has the same shape; most CDC connectors already handle this because MySQL's type system is less ambiguous than Oracle's. - PostgreSQL —
NUMERIC(p, s)is well-specified; fewer precision-aware-mapping pitfalls. - JSON documents from MongoDB — shape inference from runtime values is a different problem (type per record, not per column).
Anti-pattern¶
The alternative — emit every numeric column as a string — has been the common default in Kafka-Connect-era CDC setups for simplicity. The 2026-04-09 Redpanda post names the cost verbatim:
"Schema drift is the thing that silently corrupts your downstream data until someone notices a null where they expected a number (usually in production, usually days after the column was added, usually not by you). Most CDC setups leave this problem to you."
Seen in¶
- sources/2026-04-09-redpanda-oracle-cdc-now-available-in-redpanda-connect
— canonical wiki introduction of precision-aware type mapping
as a CDC-connector-level discipline for Oracle's
NUMBERtype. Theoracledb_cdcconnector queriesALL_TAB_COLUMNSand emitsint64for integer columns andjson.Numberfor decimals, composed withschema_registry_encodefor Avro encoding into a Schema Registry.