Skip to content

SYSTEM Cited by 1 source

Yelp Schema Validation Batch

Definition

Yelp Schema Validation Batch is the named pre-upload guard that Yelp's integration layer runs before every file upload to the third-party REVREC (Revenue Recognition) SaaS. It retrieves the REVREC system's current field mapping via a REST API and compares it against the revenue-data schema Yelp intends to upload. If any mismatch is detected (missing column, type mismatch, date-format mismatch), the upload is aborted before failure.

Disclosed by Yelp Engineering's 2025-05-27 Revenue Automation Series post on integration testing.

Role

The Schema Validation Batch exists because the third-party system's schema can change between deploy and any given upload — a column renamed, a type changed, a date format swapped. Without the batch, such changes would surface as upload failures in production with partial writes, retry storms, and manual investigation. The batch moves detection before the upload.

Mechanism

  Revenue Data Pipeline produces
   REVREC-template output (S3)
  Schema Validation Batch
   ├─→ GET REVREC mapping API (REST)
   │       returns: dateformat + per-column
   │                name + data_type
   ├─→ compare against local schema
   └─→ match?  yes ─→ proceed to Upload Batch
               no  ─→ abort + alert

API shape (from the post, verbatim)

-- Request
curl -X GET --header "token: RandomBasicToken" \
  "https://yourHost/api/integration/v1/upload/mapping?templatename=SAMPLE_TEMPLATE"
-- Response
{
    "Dateformat":"MM/DD/YYYY",
    "Mapping":[
        {
            "sample_column_id":1,
            "sample_column_name":"random",
            "sample_column_data_type":"string"
        }
    ]
}

Three axes returned + checked:

  • Dateformat — the date string format REVREC expects (e.g. MM/DD/YYYY vs YYYY-MM-DD).
  • sample_column_name — column name must match.
  • sample_column_data_type — column type must match (string, number, etc).

Why this matters

From the post: "we faced a few challenges where the internal file data format, column mapping sequence, or differences in the external table schema led to upload failures." The validation layer "minimized failure risks and improved system reliability."

Distinguishes itself from patterns/schema-validation-before-deploy (Datadog's CDC platform) on one axis: it runs per-upload at runtime rather than once at deploy time. This is because Yelp doesn't own the third-party schema — the schema can drift between deploys, and the batch must re-check each time.

Caveats

  • The batch treats REVREC as authoritative. It pulls the field mapping from REVREC and validates Yelp's output against that; it does not validate REVREC's schema against Yelp's expectation. If REVREC adds a new required column, the batch will detect the mismatch but cannot auto-remediate.
  • Sample-only API disclosure. Real column names are redacted behind sample_column_name etc.
  • No comparison-logic code published — the post shows only the request/response shape, not the diff logic.

Seen in

Last updated · 476 distilled / 1,218 read