PATTERN Cited by 1 source

Surface-specific checkpoint export¶

Pattern¶

Train one unified multi-task model jointly on combined multi-surface data, then export N separate checkpoints — one per surface — from the same training run. Each surface deploys its own checkpoint, which inherits the shared representation from joint training but can be further specialised (surface-specific architecture variants, surface-specific head composition, surface-specific post-training fine-tuning) without forking the upstream codebase.

Problem¶

A strictly unified model where every surface deploys the same weights suffers the unified-model rigidity tax:

No per-surface variation in architecture post-training (e.g. different task-head composition per surface).
No per-surface continued fine-tuning (can't fine-tune Home Feed on fresh HF data without also updating Search's deployment).
New surface-specific features must go through the unified trunk's coordination overhead.
Every change to the deployed model is a coordinated change across all surfaces.

This re-creates the coordination bottleneck the unification was supposed to eliminate — unified at the cost of losing per-surface iteration velocity.

Solution¶

Two decoupled stages:

Joint training. One unified model, one codebase, multi-surface training data. Shared trunk + multi-task heads + surface-specific tower trees all trained together.
Per-surface checkpoint export. At export time, each surface gets its own checkpoint — potentially with different activated heads, different tower configurations, or different post-training fine-tuning applied on top.

               joint training (multi-surface data)
                              │
                              ▼
              [ unified model training artifact ]
                              │
              ┌───────────────┼───────────────┐
              ▼               ▼               ▼
       export surface-A  export surface-B  export surface-C
         checkpoint        checkpoint        checkpoint
              │               │               │
         surface-A         surface-B         surface-C
         deployment        deployment        deployment

Each exported checkpoint can subsequently follow its own deployment lifecycle — independent canary rollouts, independent post-training fine-tuning, independent version management — while inheriting the shared-representation benefits from the joint training.

Canonical wiki reference¶

Pinterest's unified ads engagement model uses surface-specific checkpoint exports as the extensibility escape valve (Source: sources/2026-03-03-pinterest-unifying-ads-engagement-modeling-across-pinterest-surfaces):

"Using a single shared architecture for HF and SR CTR prediction limited flexibility and made it harder to iterate on surface-specific features and modules. To restore extensibility, we introduced a multi-task learning design within the unified model and enabled surface-specific checkpoint exports. We exported separate surface checkpoints so each surface could adopt the most appropriate architecture while still benefiting from shared representation learning. This enabled more flexible, surface-specific CTR prediction and established a foundation for continued surface-specific iteration."

The explicit framing: "continued surface-specific iteration" is the product benefit. Each surface team retains its own iteration velocity without being gated on changes to all the other surfaces.

Operational wins¶

Per-surface iteration velocity preserved alongside shared-trunk generalisation.
Decoupled deployment. A bad change that affects one surface's checkpoint doesn't block other surfaces' deployments.
Per-surface fine-tuning. Each checkpoint can be post-trained on fresh surface-specific data without touching the shared training pipeline.
Head-composition variation. Different surfaces can activate / deactivate specific task heads depending on their deployment needs.

Caveats¶

Drift between checkpoints. Over time, post-training fine-tuning on exported checkpoints can drift from the jointly-trained baseline — eventually, the exported checkpoints diverge enough that the shared-representation benefit is lost. Requires periodic re-baselining (re-export fresh checkpoints from joint training).
Storage cost. N checkpoints instead of 1 — typically negligible vs training cost but non-zero.
Export stage is not standard. Most ML frameworks don't have first-class "export N variants" tooling — requires Pinterest-style custom export logic.
Pinterest doesn't disclose the concrete export mechanism, what fine-tuning (if any) is applied post-export, or the re-baselining cadence.

patterns/unified-multi-surface-model — the outer pattern.
patterns/surface-specific-tower-tree — the trained-time specialisation that complements this deploy-time specialisation.
concepts/multi-task-learning — the training architecture that makes per-surface export coherent.

Generalisations¶

Per-customer checkpoint export. In multi-tenant model-as-a-service, export a tenant-specific checkpoint fine-tuned on that tenant's data; a common pattern in LLM fine-tuning services.
Per-region checkpoint export. For geo-distributed deployments with different user-behaviour priors, export region-specific checkpoints.
Per-device-class checkpoint export. Mobile / desktop / web as the specialisation axis.

The common structural shape: joint training for shared representation + per-segment export for deployment-level specialisation.

Seen in¶

2026-03-03 Pinterest — Unifying Ads Engagement Modeling (sources/2026-03-03-pinterest-unifying-ads-engagement-modeling-across-pinterest-surfaces) — canonical: per-surface checkpoint export from the unified model to preserve per-surface iteration velocity.