PATTERN Cited by 1 source

Event-type-by-event-type shadow cutover¶

Pattern¶

For migrating a multi-tenant ML data platform from a legacy substrate to a redesigned one, walk per-event-type: shadow-run, two-tier-compare (event-level + sequence-level), validate via A/B experiments on consuming models, controlled-cutover for that one event type, then iterate to the next event type until legacy is deprecated. Migration is per-event-type, not per-pipeline (Source: sources/2026-05-21-pinterest-making-user-sequence-data-more-cost-efficient-faster-and-easier-to-use).

Shape¶

For each event type E in {click, save, search, ...}:

  1. Run new pipeline in PARALLEL with legacy
        legacy ───► legacy sequences (production)
        new    ───► shadow sequences (validation)
        │
        ▼
  2. TWO-TIER comparison
        a. Event-level: field-by-field on matched events
        b. Sequence-level: shadow output vs legacy output
        │
        ▼
  3. A/B EXPERIMENTS on new-data sequences
        (sequences are model inputs; downstream model
         behaviour is the ultimate validation signal)
        │
        ▼
  4. CONTROLLED CUTOVER
        shift consumers of E to read from new substrate
        keep legacy running so rollback is cheap
        │
        ▼
  5. Iterate: pick next event type, repeat
        deprecate legacy path incrementally as event types complete

Why per-event-type, not per-pipeline¶

Pinterest's substrate handles many event types × many tenants × many models. A whole-pipeline cutover would mean simultaneously migrating every tenant + every event type, a near-impossible bar. Per-event-type granularity lets each event type:

Soak in shadow as long as its risk profile demands.
Be cut over independently of others.
Be rolled back without affecting unrelated event types.
Have its own validation criteria (high-volume / high-value event types get stricter thresholds).

The blast radius of a botched cutover is bounded to that one event type's consumers, not the whole platform.

Two-tier comparison¶

Pinterest's validation:

"A strategy of using two tiers of comparisons, an event-level comparison, which compared field-by-field of events we matched between our old and new indexing jobs, as well as a sequence-level comparison, comparing the shadow sequence output with the legacy sequence output."

Tier	What it catches
Event-level field-by-field	Per-event enrichment / featurisation regressions; per-field schema differences
Sequence-level	Assembly logic regressions; ordering / windowing differences; truncation differences

Both tiers are necessary. Field-level matching can pass while sequence-level assembly differs (different sort key, different window). Sequence-level matching can pass while individual fields differ (compensating errors). Two tiers reduce false-positives in opposite directions.

100% match is not the goal¶

Pinterest's explicit framing:

"Since we are regenerating the data using completely new jobs, we had to accept that the data won't have a 100% match due to the nature of our online systems. As a result, we had to have thorough validations to prove that our new system was producing approximately the same sequences when compared to the legacy system."

Online ML pipelines have inherent non-determinism: race conditions in event arrival, enrichment-service jitter, distributed-clock skew. The validation bar is approximate equivalence with sufficient evidence, not bit-for-bit identity. The discipline is in defining "approximately" with concrete thresholds + sufficient comparison windows + downstream A/B confirmation.

A/B experiments as the final validation gate¶

Sequences are model inputs. The ultimate test of "are the new sequences good enough?" is does the model behave the same when fed new vs legacy sequences? Pinterest's gating:

"Alongside performing A/B experiments using our new data, these validations gave us the confidence that we could safely swap our pipelines with no impact."

Shadow comparison catches data-shape regressions; A/B catches behavioural regressions (the data shape is right, but the model is reacting differently for some subtle reason). Both layers are needed before cutover.

Controlled cutover¶

"Once we were confident in the behavior, we performed a controlled cutover by shifting consumers to read from the new architecture. We then iterated the same process across additional event types, steadily deprecating the legacy path."

Legacy keeps running during + after cutover so rollback is cheap. Once an event type's cutover has soaked, legacy can be deprecated for that event type only, freeing legacy infrastructure incrementally.

Where this pattern fits¶

Multi-tenant ML data platforms.
Substrates with many independent event types where blast radius matters.
Substrates feeding A/B-experimentable downstream consumers (recsys, ads ranking, search ranking).
Substrates where 100%-match validation is intractable due to inherent non-determinism.

Where it doesn't fit¶

Single-pipeline workloads with no event-type granularity — the per-event-type shaping isn't meaningful.
Substrates without a downstream A/B-able consumer — the final validation gate doesn't exist.
Tight-deadline migrations — incremental walking through every event type takes time.

Sibling patterns¶

patterns/parallel-run-pattern — the canonical "run two systems and compare" pattern; this pattern adds the per-event-type granularity + two-tier comparison + A/B gate on top.
patterns/shadow-migration — sibling at compute-engine altitude (Spark → Ray); same dual-run discipline.
patterns/side-by-side-runtime-validation — sibling at runtime-comparison altitude.
patterns/shadow-then-reverse-shadow-migration — sibling at CDC pipeline altitude with explicit role-swap; this pattern walks per-event-type rather than swapping roles.
concepts/migration-job-lifecycle — the lifecycle framework for individual migration units.

Caveats¶

Event-type ordering matters. Riskier / lower-traffic event types should go first to learn the validation pipeline; high-stakes / high-volume types should go later when the team is calibrated.
Legacy decommissioning timeline can stretch if some event types resist cutover. Pinterest's iteration was open-ended; the post doesn't disclose how long the migration took or what the residual legacy footprint looks like.
Validation infrastructure cost. Shadow runs double the substrate cost during the migration. Pinterest accepted this; not every team can.
A/B power dilution. Many simultaneous A/B experiments on different event types compete for the same user traffic. Sequencing or stratification may be needed.
Cross-event-type interaction. If sequences include events of multiple types and only some types have been migrated, the "same sequence" comparison gets confused. Either migrate event-type bundles together or version the assembly logic.

Seen in¶

sources/2026-05-21-pinterest-making-user-sequence-data-more-cost-efficient-faster-and-easier-to-use — first canonical wiki instance: event-type-by-event-type shadow cutover with two-tier comparison + A/B + controlled cutover for the Pinterest user-sequence platform redesign.