Skip to content

CONCEPT Cited by 1 source

Precision-aware type mapping

Definition

Precision-aware type mapping is the schema-translation discipline where a CDC or data-ingestion connector inspects the source database's precision and scale metadata on numeric columns and emits different downstream types depending on whether the column is an integer (scale = 0) or a decimal (scale > 0), rather than collapsing every numeric column to a single wire-format type (often string or generic decimal) and losing type information at the downstream consumer.

Canonical verbatim from the Redpanda Connect oracledb_cdc launch:

"It queries Oracle's ALL_TAB_COLUMNS catalog and attaches a full column schema to each message as metadata, with precision-aware NUMBER mapping (integers as int64, decimals as json.Number)." (Source: sources/2026-04-09-redpanda-oracle-cdc-now-available-in-redpanda-connect)

Why Oracle specifically needs this

Oracle's NUMBER type is uniquely ambiguous among mainstream RDBMS numeric types: a single NUMBER(p, s) declaration can represent any numeric shape from an 18-digit integer to a high- precision decimal. Concrete shapes:

  • NUMBER(10, 0) — a 10-digit integer (fits in int64).
  • NUMBER(10, 2) — a decimal with 2 digits of scale (e.g. price-column use cases).
  • NUMBER without precision — variable-precision arithmetic.

Naive CDC connectors collapse all NUMBER columns to a single wire type (often string or a generic decimal), forcing the downstream consumer to guess whether each column is integer-shaped or decimal-shaped. Precision-aware mapping asks the data dictionary (ALL_TAB_COLUMNS for Oracle) what the column actually is, and emits the specific wire type that preserves the source semantics:

  • NUMBER(p, 0)int64 (integer, typed, fits numeric ops).
  • NUMBER(p, s) with s > 0json.Number (decimal, preserves precision without float rounding).

Why this matters to downstream consumers

Without precision-aware mapping, every downstream consumer of a CDC topic carries the burden:

  • Type mismatches surface days later in production. A column the application treats as an integer was emitted as a string by the CDC connector; the Avro schema pinned it as string; the downstream analytics table is now VARCHAR instead of BIGINT.
  • Schema Registry evolution gets harder. Upgrading a column from imprecise to precise type crosses a compatibility break in the Schema Registry's backward-compatibility rules — requires a coordinated downstream redeploy.

Precision-aware mapping pushes the type-fidelity problem upstream to the CDC connector, which has direct access to the source schema via the data dictionary. This is one of the architectural benefits of the 2026-04-09 Redpanda oracledb_cdc launch: "Your consumers get typed, schema-tracked events from day one."

Composition with Schema Registry

The 2026-04-09 post's canonical pipeline pairs precision-aware type mapping with Schema Registry for the downstream wire-format encoding:

pipeline:
  processors:
    - schema_registry_encode:
        url: http://schema-registry:8081
        subject: ${! meta("table_name") }

The oracledb_cdc input attaches the precision-aware schema metadata to each message; the schema_registry_encode processor reads that schema, registers it in the Schema Registry, and encodes the payload as Avro. The downstream consumer receives typed Avro records with int64 and decimal types preserved, not a bag of strings.

Generalisation beyond Oracle

While canonicalised here on the Oracle NUMBER type because that's where the precision ambiguity is most acute, the pattern generalises to any source with type ambiguity:

  • MySQLDECIMAL(10, 0) vs DECIMAL(10, 2) has the same shape; most CDC connectors already handle this because MySQL's type system is less ambiguous than Oracle's.
  • PostgreSQLNUMERIC(p, s) is well-specified; fewer precision-aware-mapping pitfalls.
  • JSON documents from MongoDB — shape inference from runtime values is a different problem (type per record, not per column).

Anti-pattern

The alternative — emit every numeric column as a string — has been the common default in Kafka-Connect-era CDC setups for simplicity. The 2026-04-09 Redpanda post names the cost verbatim:

"Schema drift is the thing that silently corrupts your downstream data until someone notices a null where they expected a number (usually in production, usually days after the column was added, usually not by you). Most CDC setups leave this problem to you."

Seen in

  • sources/2026-04-09-redpanda-oracle-cdc-now-available-in-redpanda-connect — canonical wiki introduction of precision-aware type mapping as a CDC-connector-level discipline for Oracle's NUMBER type. The oracledb_cdc connector queries ALL_TAB_COLUMNS and emits int64 for integer columns and json.Number for decimals, composed with schema_registry_encode for Avro encoding into a Schema Registry.
Last updated · 470 distilled / 1,213 read