SYSTEM Cited by 1 source
Pinterest Feature Trimmer¶
Definition¶
Feature Trimmer is the Pinterest online-ML-serving module that trims each per-candidate fan-out request from the root cluster to the exact feature allowlist each destination leaf model needs, eliminating the unused features that were previously shipped across the root→leaf network hop. Introduced in the 2026-05-01 Pinterest Engineering post as the "Send What You Use" counterpart to the earlier lz4 fbthrift compression win. (Source: sources/2026-05-01-pinterest-optimizing-ml-workload-network-efficiency-part-i-feature-trimmer)
Feature Trimmer is a per-root-host in-process module, initialized on boot, that keeps an in-memory consolidated mapping from model_name → version → feature_allowlist, refreshes it via file watchers on each bundle's module_info.json, and atomically swaps the active consolidated map under a read-write lock.
Load-bearing properties¶
- Send-what-you-use fan-out — the root sends to each leaf only the features that specific model version actually consumes, not the union of features the root fetched from the feature store. Canonical wiki instance of concepts/send-what-you-use at the ML-RPC-payload altitude.
- Model signature as ground truth — the allowlist comes from
archive/extra/module_info.jsoninside the model's.ptarchive, the same artefact the leaf's feature converter uses to convert internal-format features to PyTorch tensors. Signatures are treated as version-stable APIs; a signature change forks a new model. - Allowlist, not blocklist — "does not carry the burden of tracking all the features that might be in development or deprecated"; the ML feature universe monotonically grows, so a blocklist would grow faster than the allowlist and be harder to keep current.
- Consolidated in-memory map with per-bundle isolation — each bundle has its own independent map; the consolidated map is rebuilt on any bundle refresh and atomically swapped in. A corrupted bundle falls back to its previous in-memory version without affecting other bundles.
- File-watcher-driven refresh — each
module_info.jsonhas a file watcher; content changes trigger a single-bundle reload + full consolidated-map rebuild + atomic swap. - Skip trimming on miss — if no allowlist matches (model unknown, or corrupt map), the request passes through untrimmed rather than silently dropping features. Fail-safe posture.
- Version-aware lookup with latest-version fallback — omitting the version in the score request uses the latest version's allowlist; a version-specific miss also falls back to latest. Works because signatures are version-stable.
- Critical-failure-path hardening — init-time parse failures alert on-call but do not block host launch; Pinterest's explicit rationale: "This decision preserves our ability to respond to capacity-related incidents, especially if a deeper issue is affecting the Feature Trimmer module itself."
The deploy-pipeline integration — patterns/artifact-rides-model-deploy-pipeline¶
Rather than a separate config-distribution system, the per-bundle module_info.json mapping is packaged alongside other root configuration files and ships through the same staged delivery pipeline as model rollout:
1. Deploy root configs to Canary
2. Deploy model configs to Canary
3. Automated Canary Analysis (ACA)
4. Deploy root configs to Production
5. Deploy model configs to Production
Root configs always lead so that when a new leaf model version arrives, a matching allowlist is already present on the root. During rollout, root ships backwards-compatible allowlists for both current and pending versions to avoid the versioned-lookup gap that would otherwise leave pending-version requests untrimmed.
The bundle build step iterates over model versions to be shipped:
- If a model version includes module_info.json, the pipeline parses it and records the signature.
- If the signature is missing, the pipeline logs a warning and skips rather than failing the build — resilient during the incremental rollout of signature publishing.
On-host architecture¶
Feature Trimmer Module (per root host, in-process)
│
├── GFlags: file paths to each active bundle's module_info.json
│
├── independent_maps[bundle]
│ ▲
│ └── file watcher per module_info.json → reload that bundle's map
│
├── consolidated_map { model_A: { version_N: allowlist, version_M: allowlist }, ... }
│ ▲
│ └── rebuilt from all independent_maps on any reload; atomically swapped in
│
└── RW lock
├── shared lock — reads of consolidated_map + independent_maps
└── unique lock — atomic swap of consolidated_map
Request lookup flow¶
score request (model_name, model_version?)
│
▼
lookup consolidated_map[model_name]
│
├── not found → pass through untrimmed (no allowlist exists for model)
├── found, version unspecified or missing → use latest version's allowlist
└── found, version specified and found → use version-specific allowlist
Production impact¶
From the 2026-05-01 post (Pinterest Internal Data; numbers are Ads / Homefeed / Related Pins / Search / Notification composites):
- Ads root cluster network: 4 GBPS peak → <1.5 GBPS peak; fleet −27% with no performance regression.
- Ads leaf partitions peak network: 1000–1200 MBPS → <200 MBPS across all clusters.
- Ads leaf GPU: cluster-size + batch-size tuning unlocked roughly 5% of total GPU capacity that was being wasted under the pre-trimmer network bottleneck.
- Ads AdMixer client p90: >90 ms → <80 ms peak.
- Homefeed root outbound: ~1.2–2.1 GB/s → ~0.45–1.1 GB/s; fleet −33%.
- Homefeed leaf inbound: −65–75% across GPU leaf clusters; rightsizing ongoing at publication.
- Related Pins p99: ~130–180 ms with >200 ms spikes → ~95–125 ms; −25–30%. (Some models saw no change because their signature allowlist was unavailable — skip-on-miss behaviour visible in the aggregate chart.)
- Search egress: −45%; Notification egress: −65%; both network-bound clusters moved to standard (non-network-optimised) instance types; ≥30% cost reduction on both. $0.98M / year saved on Search + Notification rightsizing alone.
- Aggregate: $4M+ / year in annual infrastructure savings + headroom for larger models + +0.17% revenue lift from fewer timeout-induced failures on Ads.
- End state: bottleneck shifted from network-bound to CPU-bound on root cluster.
Not covered by Part I¶
- Client → root payload — per-candidate features sent from client services to root before fan-out. Explicitly flagged as the subject of Part II.
- Trimmer's own CPU / latency overhead on root — the post reports net latency wins (SerDe savings exceed trim cost) but does not break out the trimmer's standalone cost.
- Measured trim ratio distribution per model — the "~50% network cut" estimate in the motivation is theoretical; Pinterest reports post-trim bandwidth numbers, not per-model trim ratios.
- Observability for stale-allowlist runtime state — post mentions init-time alert but not runtime monitoring of bundles stuck on stale maps.
Seen in¶
- 2026-05-01 Pinterest — Optimizing ML Workload Network Efficiency (Part I): Feature Trimmer (sources/2026-05-01-pinterest-optimizing-ml-workload-network-efficiency-part-i-feature-trimmer) — the canonical introduction; full mechanism + production numbers + deployment integration + safeguards.
Related¶
- systems/pinterest-ml-serving-root-leaf — the architecture Feature Trimmer optimises.
- systems/fbthrift — the RPC framework carrying root-leaf traffic; lz4 compression here is the first lever, trimmer is the second.
- systems/pytorch — substrate for the
.ptarchive + TorchScript +module_info.jsonsignature artefact. - concepts/send-what-you-use — the principle.
- concepts/model-signature-as-source-of-truth — the design invariant.
- concepts/root-leaf-ml-serving-architecture — the substrate architecture.
- concepts/feature-fanout-network-bottleneck — the problem.
- concepts/network-bound-vs-compute-bound — the framing of the bottleneck.
- patterns/feature-allowlist-over-blocklist — the ML-feature-universe trade-off.
- patterns/artifact-rides-model-deploy-pipeline — the deploy-pipeline integration.
- patterns/file-watcher-atomic-swap-consolidated-map — the on-host refresh mechanism.
- patterns/skip-on-missing-allowlist-for-safety — the fail-safe posture.
- companies/pinterest — the operator.