Skip to content

SYSTEM Cited by 1 source

Pinterest Feature Trimmer

Definition

Feature Trimmer is the Pinterest online-ML-serving module that trims each per-candidate fan-out request from the root cluster to the exact feature allowlist each destination leaf model needs, eliminating the unused features that were previously shipped across the root→leaf network hop. Introduced in the 2026-05-01 Pinterest Engineering post as the "Send What You Use" counterpart to the earlier lz4 fbthrift compression win. (Source: sources/2026-05-01-pinterest-optimizing-ml-workload-network-efficiency-part-i-feature-trimmer)

Feature Trimmer is a per-root-host in-process module, initialized on boot, that keeps an in-memory consolidated mapping from model_name → version → feature_allowlist, refreshes it via file watchers on each bundle's module_info.json, and atomically swaps the active consolidated map under a read-write lock.

Load-bearing properties

  • Send-what-you-use fan-out — the root sends to each leaf only the features that specific model version actually consumes, not the union of features the root fetched from the feature store. Canonical wiki instance of concepts/send-what-you-use at the ML-RPC-payload altitude.
  • Model signature as ground truth — the allowlist comes from archive/extra/module_info.json inside the model's .pt archive, the same artefact the leaf's feature converter uses to convert internal-format features to PyTorch tensors. Signatures are treated as version-stable APIs; a signature change forks a new model.
  • Allowlist, not blocklist"does not carry the burden of tracking all the features that might be in development or deprecated"; the ML feature universe monotonically grows, so a blocklist would grow faster than the allowlist and be harder to keep current.
  • Consolidated in-memory map with per-bundle isolation — each bundle has its own independent map; the consolidated map is rebuilt on any bundle refresh and atomically swapped in. A corrupted bundle falls back to its previous in-memory version without affecting other bundles.
  • File-watcher-driven refresh — each module_info.json has a file watcher; content changes trigger a single-bundle reload + full consolidated-map rebuild + atomic swap.
  • Skip trimming on miss — if no allowlist matches (model unknown, or corrupt map), the request passes through untrimmed rather than silently dropping features. Fail-safe posture.
  • Version-aware lookup with latest-version fallback — omitting the version in the score request uses the latest version's allowlist; a version-specific miss also falls back to latest. Works because signatures are version-stable.
  • Critical-failure-path hardening — init-time parse failures alert on-call but do not block host launch; Pinterest's explicit rationale: "This decision preserves our ability to respond to capacity-related incidents, especially if a deeper issue is affecting the Feature Trimmer module itself."

The deploy-pipeline integration — patterns/artifact-rides-model-deploy-pipeline

Rather than a separate config-distribution system, the per-bundle module_info.json mapping is packaged alongside other root configuration files and ships through the same staged delivery pipeline as model rollout:

1. Deploy root configs to Canary
2. Deploy model configs to Canary
3. Automated Canary Analysis (ACA)
4. Deploy root configs to Production
5. Deploy model configs to Production

Root configs always lead so that when a new leaf model version arrives, a matching allowlist is already present on the root. During rollout, root ships backwards-compatible allowlists for both current and pending versions to avoid the versioned-lookup gap that would otherwise leave pending-version requests untrimmed.

The bundle build step iterates over model versions to be shipped: - If a model version includes module_info.json, the pipeline parses it and records the signature. - If the signature is missing, the pipeline logs a warning and skips rather than failing the build — resilient during the incremental rollout of signature publishing.

On-host architecture

Feature Trimmer Module (per root host, in-process)
   ├── GFlags: file paths to each active bundle's module_info.json
   ├── independent_maps[bundle]
   │       ▲
   │       └── file watcher per module_info.json → reload that bundle's map
   ├── consolidated_map  { model_A: { version_N: allowlist, version_M: allowlist }, ... }
   │       ▲
   │       └── rebuilt from all independent_maps on any reload; atomically swapped in
   └── RW lock
         ├── shared lock  — reads of consolidated_map + independent_maps
         └── unique lock  — atomic swap of consolidated_map

Request lookup flow

score request (model_name, model_version?)
lookup consolidated_map[model_name]
     ├── not found       →  pass through untrimmed (no allowlist exists for model)
     ├── found, version unspecified or missing → use latest version's allowlist
     └── found, version specified and found    → use version-specific allowlist

Production impact

From the 2026-05-01 post (Pinterest Internal Data; numbers are Ads / Homefeed / Related Pins / Search / Notification composites):

  • Ads root cluster network: 4 GBPS peak → <1.5 GBPS peak; fleet −27% with no performance regression.
  • Ads leaf partitions peak network: 1000–1200 MBPS → <200 MBPS across all clusters.
  • Ads leaf GPU: cluster-size + batch-size tuning unlocked roughly 5% of total GPU capacity that was being wasted under the pre-trimmer network bottleneck.
  • Ads AdMixer client p90: >90 ms → <80 ms peak.
  • Homefeed root outbound: ~1.2–2.1 GB/s → ~0.45–1.1 GB/s; fleet −33%.
  • Homefeed leaf inbound: −65–75% across GPU leaf clusters; rightsizing ongoing at publication.
  • Related Pins p99: ~130–180 ms with >200 ms spikes → ~95–125 ms; −25–30%. (Some models saw no change because their signature allowlist was unavailable — skip-on-miss behaviour visible in the aggregate chart.)
  • Search egress: −45%; Notification egress: −65%; both network-bound clusters moved to standard (non-network-optimised) instance types; ≥30% cost reduction on both. $0.98M / year saved on Search + Notification rightsizing alone.
  • Aggregate: $4M+ / year in annual infrastructure savings + headroom for larger models + +0.17% revenue lift from fewer timeout-induced failures on Ads.
  • End state: bottleneck shifted from network-bound to CPU-bound on root cluster.

Not covered by Part I

  • Client → root payload — per-candidate features sent from client services to root before fan-out. Explicitly flagged as the subject of Part II.
  • Trimmer's own CPU / latency overhead on root — the post reports net latency wins (SerDe savings exceed trim cost) but does not break out the trimmer's standalone cost.
  • Measured trim ratio distribution per model — the "~50% network cut" estimate in the motivation is theoretical; Pinterest reports post-trim bandwidth numbers, not per-model trim ratios.
  • Observability for stale-allowlist runtime state — post mentions init-time alert but not runtime monitoring of bundles stuck on stale maps.

Seen in

Last updated · 445 distilled / 1,275 read