Skip to content

PATTERN Cited by 1 source

Schema validation pre-upload via mapping API

Summary

Before every bulk upload to a third-party system whose schema you don't own, fetch the third-party's current field mapping via a REST introspection API and validate the outbound file against it. Abort the upload if any column name, data type, or format mismatches. This is the runtime- per-upload cousin of patterns/schema-validation-before-deploy (deploy-time) — necessary when the consumer schema can drift between deploys without your knowledge.

When to use

All three conditions:

  1. You're uploading to a third-party system whose schema you don't own.
  2. That system exposes a mapping / metadata API (REST, GraphQL, SDK call — any introspection endpoint).
  3. The cost of a failed upload is high — partial writes, retry storms, support-ticket escalation, weekend firefighting.

Skip if (a) the downstream is internal and owned by a team you can coordinate schema changes with, (b) the downstream enforces strict schema via a registry you already validate against at deploy time, or (c) upload volume is so low that a single failed upload is trivially recoverable.

Structure

Pipeline produces output file
Pre-upload validator:
  ├─ GET <third-party>/mapping
  │     returns:
  │       date_format: "MM/DD/YYYY"
  │       columns:
  │         - { name: "sample_column_name",
  │             data_type: "string" }
  ├─ Compare against outbound file schema
  │     - date format matches?
  │     - every expected column present?
  │     - every column's type matches?
  └─ Match ─yes→ proceed to upload batch
            no  → abort + alert → investigate

Three validation axes (Yelp's canonical)

  • Date format string — e.g. MM/DD/YYYY. Easy to miss.
  • Column names — renames / deletions on the downstream.
  • Column data types — type drift (stringnumber etc).

Example (from Yelp, verbatim)

GET https://yourHost/api/integration/v1/upload/mapping?templatename=SAMPLE_TEMPLATE
Header: token: RandomBasicToken

Response:
{
    "Dateformat":"MM/DD/YYYY",
    "Mapping":[
        { "sample_column_id":1,
          "sample_column_name":"random",
          "sample_column_data_type":"string" }
    ]
}

The validator parses the response, walks the output file's columns, and aborts on any mismatch on any of the three axes.

Trade-offs

  • One extra API call per upload. Cheap when uploads are daily / per-batch; expensive if you're uploading at sub-second rates.
  • Relies on third-party stability of the mapping API. If the mapping API itself is flaky, the validator becomes a reliability risk.
  • Doesn't auto-remediate. On mismatch you abort and alert; fixing the mismatch (either updating Yelp's output schema or escalating to the third party to revert their change) is human work.
  • False confidence on semantic mismatch. The validator confirms the schema matches; it doesn't confirm the semantics match. If the downstream renamed a column but kept the same data type, syntactic validation passes while the semantic meaning differs.

Relationship to deploy-time variants

Pattern Time Against Owner
patterns/schema-validation-before-deploy Deploy time Consumer-contract registry You/your company
patterns/client-side-schema-validation Write time Registry enforced at producer You + coordinated consumer
This pattern Upload time Live-polled third-party schema Not you

The defining feature of this pattern is: you don't own the downstream schema, so you cannot validate once-at-deploy and be safe. You must re-check each upload.

Seen in

Last updated · 476 distilled / 1,218 read