PATTERN Cited by 1 source
Schema validation pre-upload via mapping API¶
Summary¶
Before every bulk upload to a third-party system whose schema you don't own, fetch the third-party's current field mapping via a REST introspection API and validate the outbound file against it. Abort the upload if any column name, data type, or format mismatches. This is the runtime- per-upload cousin of patterns/schema-validation-before-deploy (deploy-time) — necessary when the consumer schema can drift between deploys without your knowledge.
When to use¶
All three conditions:
- You're uploading to a third-party system whose schema you don't own.
- That system exposes a mapping / metadata API (REST, GraphQL, SDK call — any introspection endpoint).
- The cost of a failed upload is high — partial writes, retry storms, support-ticket escalation, weekend firefighting.
Skip if (a) the downstream is internal and owned by a team you can coordinate schema changes with, (b) the downstream enforces strict schema via a registry you already validate against at deploy time, or (c) upload volume is so low that a single failed upload is trivially recoverable.
Structure¶
Pipeline produces output file
│
▼
Pre-upload validator:
├─ GET <third-party>/mapping
│ returns:
│ date_format: "MM/DD/YYYY"
│ columns:
│ - { name: "sample_column_name",
│ data_type: "string" }
│
├─ Compare against outbound file schema
│ - date format matches?
│ - every expected column present?
│ - every column's type matches?
│
└─ Match ─yes→ proceed to upload batch
no → abort + alert → investigate
Three validation axes (Yelp's canonical)¶
- Date format string — e.g.
MM/DD/YYYY. Easy to miss. - Column names — renames / deletions on the downstream.
- Column data types — type drift (
string→numberetc).
Example (from Yelp, verbatim)¶
GET https://yourHost/api/integration/v1/upload/mapping?templatename=SAMPLE_TEMPLATE
Header: token: RandomBasicToken
Response:
{
"Dateformat":"MM/DD/YYYY",
"Mapping":[
{ "sample_column_id":1,
"sample_column_name":"random",
"sample_column_data_type":"string" }
]
}
The validator parses the response, walks the output file's columns, and aborts on any mismatch on any of the three axes.
Trade-offs¶
- One extra API call per upload. Cheap when uploads are daily / per-batch; expensive if you're uploading at sub-second rates.
- Relies on third-party stability of the mapping API. If the mapping API itself is flaky, the validator becomes a reliability risk.
- Doesn't auto-remediate. On mismatch you abort and alert; fixing the mismatch (either updating Yelp's output schema or escalating to the third party to revert their change) is human work.
- False confidence on semantic mismatch. The validator confirms the schema matches; it doesn't confirm the semantics match. If the downstream renamed a column but kept the same data type, syntactic validation passes while the semantic meaning differs.
Relationship to deploy-time variants¶
| Pattern | Time | Against | Owner |
|---|---|---|---|
| patterns/schema-validation-before-deploy | Deploy time | Consumer-contract registry | You/your company |
| patterns/client-side-schema-validation | Write time | Registry enforced at producer | You + coordinated consumer |
| This pattern | Upload time | Live-polled third-party schema | Not you |
The defining feature of this pattern is: you don't own the downstream schema, so you cannot validate once-at-deploy and be safe. You must re-check each upload.
Seen in¶
- sources/2025-05-27-yelp-revenue-automation-series-testing-an-integration-with-third-party-system — canonical instance. Yelp's Schema Validation Batch; full request/response example; reliability rationale.
Related¶
- concepts/data-upload-format-validation — the concept form
- concepts/schema-evolution
- patterns/schema-validation-before-deploy — deploy-time variant
- patterns/client-side-schema-validation — producer- enforced variant
- systems/yelp-schema-validation-batch — canonical implementation