PATTERN Cited by 1 source

Schema validation pre-upload via mapping API¶

Summary¶

Before every bulk upload to a third-party system whose schema you don't own, fetch the third-party's current field mapping via a REST introspection API and validate the outbound file against it. Abort the upload if any column name, data type, or format mismatches. This is the runtime- per-upload cousin of patterns/schema-validation-before-deploy (deploy-time) — necessary when the consumer schema can drift between deploys without your knowledge.

When to use¶

All three conditions:

You're uploading to a third-party system whose schema you don't own.
That system exposes a mapping / metadata API (REST, GraphQL, SDK call — any introspection endpoint).
The cost of a failed upload is high — partial writes, retry storms, support-ticket escalation, weekend firefighting.

Skip if (a) the downstream is internal and owned by a team you can coordinate schema changes with, (b) the downstream enforces strict schema via a registry you already validate against at deploy time, or (c) upload volume is so low that a single failed upload is trivially recoverable.

Structure¶

Pipeline produces output file
           │
           ▼
Pre-upload validator:
  ├─ GET <third-party>/mapping
  │     returns:
  │       date_format: "MM/DD/YYYY"
  │       columns:
  │         - { name: "sample_column_name",
  │             data_type: "string" }
  │
  ├─ Compare against outbound file schema
  │     - date format matches?
  │     - every expected column present?
  │     - every column's type matches?
  │
  └─ Match ─yes→ proceed to upload batch
            no  → abort + alert → investigate

Three validation axes (Yelp's canonical)¶

Date format string — e.g. MM/DD/YYYY. Easy to miss.
Column names — renames / deletions on the downstream.
Column data types — type drift (string → number etc).

Example (from Yelp, verbatim)¶

GET https://yourHost/api/integration/v1/upload/mapping?templatename=SAMPLE_TEMPLATE
Header: token: RandomBasicToken

Response:
{
    "Dateformat":"MM/DD/YYYY",
    "Mapping":[
        { "sample_column_id":1,
          "sample_column_name":"random",
          "sample_column_data_type":"string" }
    ]
}

The validator parses the response, walks the output file's columns, and aborts on any mismatch on any of the three axes.

Trade-offs¶

One extra API call per upload. Cheap when uploads are daily / per-batch; expensive if you're uploading at sub-second rates.
Relies on third-party stability of the mapping API. If the mapping API itself is flaky, the validator becomes a reliability risk.
Doesn't auto-remediate. On mismatch you abort and alert; fixing the mismatch (either updating Yelp's output schema or escalating to the third party to revert their change) is human work.
False confidence on semantic mismatch. The validator confirms the schema matches; it doesn't confirm the semantics match. If the downstream renamed a column but kept the same data type, syntactic validation passes while the semantic meaning differs.

Relationship to deploy-time variants¶

Pattern	Time	Against	Owner
patterns/schema-validation-before-deploy	Deploy time	Consumer-contract registry	You/your company
patterns/client-side-schema-validation	Write time	Registry enforced at producer	You + coordinated consumer
This pattern	Upload time	Live-polled third-party schema	Not you

The defining feature of this pattern is: you don't own the downstream schema, so you cannot validate once-at-deploy and be safe. You must re-check each upload.

Seen in¶

sources/2025-05-27-yelp-revenue-automation-series-testing-an-integration-with-third-party-system — canonical instance. Yelp's Schema Validation Batch; full request/response example; reliability rationale.

concepts/data-upload-format-validation — the concept form
concepts/schema-evolution
patterns/schema-validation-before-deploy — deploy-time variant
patterns/client-side-schema-validation — producer- enforced variant
systems/yelp-schema-validation-batch — canonical implementation