PATTERN Cited by 1 source

Surface-specific tower tree¶

Pattern¶

Within a unified multi-surface ML model, give each product surface / view type its own tower tree — a surface-routed subnetwork above the shared trunk, with late fusion of surface-specific modules into the tower. At serving time, route each request to its surface-specific tower tree so the request only pays for its surface's specialisation, not N−1 other surfaces' specialisations.

Problem¶

A unified multi-surface model with a fully shared architecture all the way to the prediction head suffers two costs:

Over-generalisation cost. One set of weights tries to serve all surfaces → none gets optimal specialisation. Some surface-specific patterns (feature interactions specific to Search's query-token signal, feature interactions specific to Related Pins' context-Pin signal) are under-fit.
Serving-cost unfairness. If every surface runs through every architectural module, cheap surfaces pay for expensive surfaces' specialisation even though they don't use it.

Solution¶

Architecturally: shared trunk → surface-specific tower trees → surface-specific calibration → surface-specific prediction.

         [ shared trunk — features, embeddings, encoder, MMoE ]
                            │
                            ▼
                  (shared representation)
                            │
              ┌─────────────┼─────────────┐
              ▼             ▼             ▼
      surface-A tower  surface-B tower  surface-C tower
           +                +                +
      A-specific       B-specific       C-specific
      modules          modules          modules
       (late fusion)   (late fusion)   (late fusion)
              │             │             │
              ▼             ▼             ▼
      surface-A calib  surface-B calib  surface-C calib
              │             │             │
           A output      B output      C output

Pinterest's load-bearing framing (Source: sources/2026-03-03-pinterest-unifying-ads-engagement-modeling-across-pinterest-surfaces):

"A single unified model that serves three surfaces, while still supporting the development of surface-specific modules (for example, surface-specific tower trees and late fusion with surface-specific modules within those tower trees). During serving, each surface-specific tower tree and its associated modules will handle only that surface's traffic, avoiding unnecessary compute cost from modules that don't benefit other surfaces."

Two routing mechanisms:

Training-time. Examples from each surface update only their surface's tower tree + the shared trunk. The trunk receives gradient signal from all surfaces; the tower trees receive only their surface's gradient.
Serving-time. A request's surface identity routes it to the correct tower tree. Only the trunk compute + its surface tower tree's compute is paid per request.

Relationship to MMoE¶

MMoE routes experts per task via gates. Surface-specific tower trees route whole subnetworks per surface. The two can stack: an MMoE inside the shared trunk + surface-specific tower trees on top. MMoE handles fine-grained expert specialisation over the trunk; tower trees handle coarse-grained subnetwork specialisation over the downstream work.

Structurally, surface-specific tower trees are MMoE at the tower granularity — one gate per surface selecting its own tower-tree subnetwork, with near-binary routing (each surface sees only its tower at serving time).

Relationship to multi-task heads¶

Multi-task heads typically share the tower and only differ at the prediction head. Surface-specific tower trees go deeper — they diverge earlier (at the tower level) so surface-specific feature interactions can be modelled. A unified model can combine both: multi-task heads for per-task prediction variety, tower trees for per-surface representation variety.

Canonical wiki reference¶

Pinterest's unified ads engagement model uses HF + SR tower trees at time of the post; RP tower tree is future work (Source: sources/2026-03-03-pinterest-unifying-ads-engagement-modeling-across-pinterest-surfaces). Late fusion of surface-specific modules into each tower tree is named explicitly, allowing surface-specific modules to join the tower without being forced through the shared trunk.

When to apply¶

Unified multi-surface model where surfaces have structurally different feature sets — surface-specific modules need to attach somewhere, and the tower tree is the natural attachment point.
Serving-cost fairness matters — cheap surfaces shouldn't pay for expensive surfaces' specialisation.
Surfaces have different user-intent priors that warrant different downstream compute paths.

Caveats¶

Pinterest doesn't disclose tower tree depth, per-surface module count, or the late-fusion mechanism.
Training-data imbalance risk. If one surface has much more traffic than others, its tower tree overfits faster than others' — requires per-surface loss weighting or sampling adjustment.
Parameter count scales linearly with surfaces. Each new surface adds a whole tower tree; at large N, the total parameter count grows.
Distinct from sparse MoE LLMs. The per-surface routing is not "sparsely activate top-k of N experts"; it's "activate exactly your surface's tower, skip the others."

patterns/unified-multi-surface-model — the outer pattern this fits inside.
patterns/surface-specific-checkpoint-export — the complementary deployment mechanism.
concepts/multi-task-learning — the broader framing.
concepts/surface-specific-calibration — per-surface calibration on top of the tower tree.
concepts/mixture-of-experts — MMoE as the sibling mechanism in the shared trunk.

Seen in¶

2026-03-03 Pinterest — Unifying Ads Engagement Modeling (sources/2026-03-03-pinterest-unifying-ads-engagement-modeling-across-pinterest-surfaces) — canonical: surface-specific tower trees + late fusion of surface-specific modules within those trees, each handling only its surface's traffic.