SYSTEM Cited by 1 source
DCNv2 (Deep & Cross Network v2)¶
Definition¶
DCNv2 (Deep & Cross Network v2) is a neural network architecture for learning explicit feature crosses, introduced by Wang et al. (Google) in 2020 (arXiv:2008.13535) as a successor to the 2017 DCN paper. DCNv2 stacks cross layers that learn explicit bounded-degree feature interactions alongside a deep tower, and is a standard building block in modern CTR-prediction / ads-ranking architectures.
Architectural role¶
Canonical DCNv2 has two co-trained towers:
- Cross network — stacked cross layers where each layer computes an explicit feature-cross of the input with a learned weight matrix, enabling bounded-degree polynomial interactions without manually engineering cross features.
- Deep network — standard MLP capturing arbitrary feature interactions.
The two outputs are combined (concatenate or sum) for the final prediction.
Use at Pinterest — as a projection layer¶
In Pinterest's unified ads engagement model (Source: sources/2026-03-03-pinterest-unifying-ads-engagement-modeling-across-pinterest-surfaces), DCNv2 is used as a projection layer — not as the top-of-stack ranker:
"We simplified the expensive compute paths by using DCNv2 to project the Transformer outputs into a smaller representation before downstream crossing and tower tree layers, which reduced serving latency while preserving signal."
Architectural position: Transformer output → DCNv2 projection → downstream crossing + tower tree layers. The projection shrinks representation width for latency while DCNv2's cross-layer structure preserves feature-interaction signal that pure linear projection would discard.
This is a slightly unconventional use of DCNv2 — most published uses treat DCNv2 as the model's final ranking network. Pinterest uses it as a compression bridge between an expensive upstream encoder (long-sequence Transformer) and expensive downstream layers (feature crossing + surface-specific tower trees).
Caveats¶
- Pinterest doesn't disclose topology — number of cross layers, input/output dims, whether the deep tower is present alongside the cross tower, mixture weights. The only explicit claim is "reduced serving latency while preserving signal."
- Signal-preservation claim is qualitative — no ablation comparing DCNv2 projection vs linear projection vs no projection.
- Stub — canonical DCNv2 architecture paper not ingested; the full technical details live in the arXiv paper.
Seen in¶
- 2026-03-03 Pinterest — Unifying Ads Engagement Modeling (sources/2026-03-03-pinterest-unifying-ads-engagement-modeling-across-pinterest-surfaces) — canonical wiki instance of DCNv2 used as a projection-layer compression bridge between Transformer output and downstream crossing layers in a production ads ranking model.