Pinterest¶

Pinterest Engineering is a Tier-2 source on the sysdesign-wiki. Pinterest operates a visual- discovery + recommendation platform at hyperscale; the engineering blog's most substantive architectural posts cluster around storage (HBase → TiDB / KV stores / Goku), data-processing platforms (Moka on Yunikorn), quota + governance infrastructure (Piqama), graph services (Zen), indexed datastores (Ixia), ads ranking / ML serving (unified multi-surface engagement models, L1 CVR online-offline debugging, MMoE + long-sequence Transformers), Home Feed multi-objective optimization / diversification (DPP → SSD → unified soft-spacing with PinCLIP + Semantic ID signals, hosted on PyTorch on company-wide model serving cluster), production Text-to-SQL / Analytics Agent on top of PinCat governance catalog + unified context-intent embeddings over query history + internal Vector DB service on OpenSearch, and ML platform. Historical operator of one of the largest HBase deployments in the world before the 2021 deprecation decision.

Key systems¶

User-sequence platform — configuration-as-code + shared execution engine + lambda architecture + columnar time-partitioned storage (2026-05-21 user-sequence platform post)¶

systems/pinterest-user-sequence-platform — the multi-tenant data substrate that ingests, filters, enriches, and serves user event sequences for ranking, retrieval, and recommendation models across Pinterest surfaces — including the ~16K-token sequences fed into Pinterest Foundation Model and TransAct. Six-component architecture (stream + batch ingestion → shared enrichment + execution layer → real-time indexer + batch indexer/backfill → columnar time-partitioned storage → online serving API). Four design decisions: configuration-as-code (Python configs → portable JSON in object storage), shared execution engine + pluggable executors, lambda architecture for fresh + complete sequences, columnar time-partitioned storage with table semantics. Migration discipline: event-type-by-event-type shadow cutover with two-tier comparison (event-level field-by-field + sequence-level shadow vs legacy) plus A/B experiments before consumer cutover, iterating across event types. Outcomes (qualitative per company policy): "significant infrastructure cost reductions" on storage / replication / network; onboarding "dropped substantially" (mostly config + small executor changes); "improved engagement metrics" on major recommendation surfaces post-migration. The platform is the structural answer to online-offline discrepancy at the data-substrate layer — instead of debugging features-not-matching after launches, "one definition, many runtimes" makes definition divergence between training and serving architecturally impossible. Upstream of the model-side scaling work documented in the sources/2026-04-13-pinterest-scaling-recommendation-systems-with-request-level-deduplication|2026-04-13 request-level deduplication post (the user sequences this platform produces are the same ~16K-token objects deduplicated across the Foundation Model + DCAT scaleup).
concepts/user-event-sequence — first canonical wiki concept for the four-stage data primitive: ingest → filter → enrich → assemble. Reframes user sequences from "the N latest events from a log table" to a multi-step pipeline output with versioned schemas, configurable filtering, configurable enrichments, and assembly semantics. The substrate-level definition consumed by sequence-aware models (Foundation Model, TransAct, Contextual Sequential CG).
concepts/one-definition-many-runtimes — first canonical wiki concept for the platform organising principle: "Define a signal or event type once, then instantiate it consistently across multiple runtimes." Single configuration surface drives streaming + batch + serving. The structural cure for the split-brain failure mode where "training pipelines build sequences one way from batch tables while serving systems assemble sequences a different way from online stores. Over time, those two views naturally drift apart in subtle ways."
concepts/sequence-quality-dimensions — first canonical wiki concept for the four-dimensional quality contract: freshness (how quickly new events show up) + completeness (late arrivals + corrections + backfills eventually reflected) + consistent enrichment (streaming = batch, training = serving) + stable schemas (versioned + predictable). Multi-tenant substrate forces correctness + observability + operability to be first-class alongside throughput / latency.
concepts/enrichment-execution-engine — first canonical wiki concept for the engine + pluggable-executor architectural primitive: framework owns IO substrate / concurrency / retries / backpressure / config validation; executors own per-event-type filtering / featurisation / raw → normalised mapping. The structural piece that makes "one definition, many runtimes" mechanical — same engine + same executors run in both streaming and batch.
concepts/online-offline-discrepancy (extended) — second canonical instance: structural prevention at the data-substrate layer complementing the 2026-02-27 model-layer diagnosis. The substrate-side fix vs the model-side debugger.
patterns/configuration-as-code-feature-pipeline — first canonical wiki pattern for Python configs validated and compiled to portable JSON in managed object storage, consumed by all runtimes. Three configuration types: sequence-feature, event-type, enrichment. Three named benefits — velocity (config not pipelines), safety (review + rollback + audit), separation of concerns (what vs how).
patterns/shared-execution-engine-pluggable-executors — first canonical wiki pattern for the framework / plugin contract: framework wires data sources + sinks, handles concurrency / retries / backpressure / config validation; executors own the per-event-type business logic. Same engine + executors in streaming and batch — collapses the lambda-architecture dual-code-path tax to two scheduling shapes of one code path.
patterns/lambda-architecture-for-fresh-and-complete-sequences — first canonical wiki pattern for streaming "now" + batch "fixes history" cooperating via shared executor logic. Streaming optimises freshness; batch optimises completeness + correctness via late-arrival absorption + corrections + backfills. Cost objection mitigated because the executors are shared across paths via the shared execution engine.
patterns/columnar-time-partitioned-feature-storage — first canonical wiki pattern for column-per-enrichment + time-bucket partitioning + table semantics for ML feature substrates. Replaces "large, consolidated 'enriched event' blobs" where every read pulls the whole payload regardless of need. Two axes of payoff — efficiency (compression + bandwidth + bounded I/O on long histories) and operability (familiar table abstractions for inspecting anomalous days, validating new enrichments, comparing pipelines side-by-side).
patterns/event-type-by-event-type-shadow-cutover — first canonical wiki pattern for per-event-type incremental migration with two-tier comparison + A/B + controlled cutover. Per-event-type granularity (not per-pipeline) bounds blast radius. Two-tier comparison (event-level field-by-field + sequence-level shadow vs legacy) catches different regression classes. A/B on consuming models is the final validation gate. "100% match is not the goal" — "approximately the same sequences" with sufficient validation evidence.

Contextual Sequential Two-Tower CG — context layer + hybrid offline/online inference (2026-05-08 contextual-CG post)¶

systems/pinterest-contextual-sequential-cg — Pinterest's contextual evolution of its prior offsite-conversion-history Transformer-based candidate generator. Adds a context layer to the user tower that consumes the subject Pin (the Pin the user is currently viewing) as interest-category embeddings + user demographics, concatenated with the Transformer encoder's output before the final MLP. Three-pillar design: model architecture (context layer), training (synthetic pseudo-context derived from positive labels + high-dropout regularisation), and serving (hybrid offline/online user tower inference — Transformer encoder cached daily in feature store, context layer + MLP head computed online). Production wins on Related Pins: 3x–10x Recall@K (offline), +275–300% median candidate relevance, +1.08% ads relevance overall, 2x more candidates retrieved → delivered to impression, ~0.7% ROAS lift (~1.4% in top revenue countries). Diagnostic insight: candidate survival rate — "less than 1% of impressions on Related Pins" before the context layer.
systems/pinterest-sequential-cg — the predecessor Sequential Two-Tower CG with offline-only Transformer-encoder user embeddings; the system whose survival-rate collapse on Related Pins motivated the contextual evolution. Stub — the original Ads Candidate Generation using Behavioral Sequence Modeling post is referenced but not separately ingested on the wiki.
systems/transformer (extended) — third distinct Pinterest role: offline-cached encoder in a hybrid offline/online user tower; the encoder runs in a daily batch and its last hidden state is stored in the feature store, with the online portion of the tower fusing it with real-time context. Distinct from the engagement-model Transformer use (online inference, fed into surface-specific tower trees) and the shopping-conversion-CG Transformer use (encoder for user-action sequences within the user tower).
systems/pinterest-related-pins (extended) — primary serving surface for the contextual sequential CG; the surface where the prior offline-only CG had a "less than 1% of impressions" survival-rate collapse, and where 2x candidate delivery is the post-context-layer measured win.
concepts/context-layer-in-two-tower — first canonical wiki concept for a tower-internal component that fuses request-time real-time-context features with offline-encoded historical state. Sibling to the parallel DCN+MLP tower-internal primitive in shape but distinct in purpose (real-time intent fusion vs feature crossing).
concepts/subject-pin — Pinterest's term for the Pin a user is currently viewing; the load-bearing real-time intent signal on Related Pins. First canonical wiki concept page for this Pinterest-specific (but generalisable) request-time intent feature.
concepts/hybrid-tower-inference-split — first canonical wiki concept for splitting the user (query) tower's forward pass into an offline-batched portion (heavy historical encoder, cached in feature store with daily refresh) and an online portion (lightweight context layer + MLP head running at request time). Generalises the precomputed-tower pattern of two-tower retrieval to also precompute part of the user tower.
concepts/pseudo-context-augmentation — first canonical wiki concept for the training-time data-augmentation technique of synthesising a pseudo-version of a request-time-only feature so the model can learn to consume it. Pinterest's instance: project interest-category features from the positive label (the conversion item) so the context layer sees a feature of compatible shape during training and at serving time.
concepts/candidate-survival-rate — first canonical wiki concept for the share of CG-retrieved candidates that survive the downstream ranking funnel and reach impression. The diagnostic that revealed the prior offline-only CG was uncompetitive on Related Pins ("less than 1% of impressions"); distinct from intrinsic recall@K because it captures CG-ranker signal alignment.
concepts/real-time-context-feature — first canonical wiki concept for features whose value is determined at request time (subject Pin, search query, currently-playing track). Forces the consuming model component to run online and creates the structural pressure for hybrid offline/online inference shapes.
patterns/hybrid-offline-online-user-tower-inference — first canonical wiki pattern for the hybrid offline/online user-tower-inference architecture; companion to two-tower's classic precomputed-item-tower pattern.
patterns/synthetic-pseudo-context-from-label — first canonical wiki pattern for the "synthesise pseudo-context from positive labels with the same input shape as serving-time real context" training scheme, paired with high-dropout regularisation.
patterns/high-dropout-on-augmented-feature-layer — first canonical wiki pattern for deliberately high dropout on a layer that consumes synthetic / augmented features to mitigate label-leakage shortcuts — the structural mitigation that makes pseudo-context augmentation tractable.

Feature Trimmer + ML serving root-leaf architecture — network-bandwidth optimisation on the online serving path (2026-05-01 Feature Trimmer post)¶

systems/pinterest-feature-trimmer — the per-root-host module that trims each fan-out RPC payload to the feature allowlist the destination leaf model needs, sourced from the PyTorch model signature (module_info.json in the .pt archive). Consolidated in-memory map keyed by model name + version; per-bundle independent maps + atomic swap under RW lock (patterns/file-watcher-atomic-swap-consolidated-map); three-layer safety posture — init-failure railguard (alert but don't block launch), per-bundle failure isolation, backward-compatible rolling-deploy artefacts + skip-on-miss untrimmed passthrough. Production wins: Ads root cluster −27%, Homefeed root −33%, Ads leaf peak network 1000–1200 MBPS → <200 MBPS, Homefeed leaf inbound −65–75%, Related Pins p99 −25–30%, Search/Notification egress −45/−65% enabling instance-type downgrade with ≥30% cost reduction, $4M+/yr total infrastructure savings + 0.17% revenue lift from fewer timeouts; end state: bottleneck shifted from network to CPU on root cluster.
systems/pinterest-ml-serving-root-leaf — the root-leaf architecture substrate: CPU root (feature fetch + preprocessing + shared in-memory feature cache, on AWS m6in pre-trimmer) + GPU leaf partitions (model inference, one partition per related group of models hosting production + experimental variants). Three named benefits — simplified model onboarding, reduced feature-store QPS via shared cache, CPU/GPU tier right-sizing — paid for by moving feature data onto the network, which turned Pinterest's serving into network-bound not compute-bound. Forcing function for network-optimised instance types on root + for Feature Trimmer.
systems/fbthrift — first wiki canonicalisation of fbthrift as Pinterest's RPC framework for root-leaf traffic; lz4 compression as first-lever −20% bandwidth, +5% CPU, +5 ms (~10%) p90 datum. Canonical compression codec trade-off at RPC altitude; companion to the Kafka-family lz4/zstd data points already on the wiki.
concepts/root-leaf-ml-serving-architecture — architectural primitive canonicalised; three named benefits of the split + feature-fan-out-network-bottleneck as the structural cost; sibling to concepts/scatter-gather-query at query altitude and to Netflix's Lightbulb+Envoy ML-serving platform on the routing axis (this post optimises the payload, Netflix optimises the routing).
concepts/feature-fanout-network-bottleneck — the failure mode the root-leaf split introduces; symptoms (idle GPU SM activity + m6in instance-type pressure + scaling-on-network) + two remedies (RPC compression + Send-What-You-Use); Pinterest's GPU was compute-idle-but-network-bottlenecked before trimmer launched.
concepts/send-what-you-use — the overarching principle; IWYU analogy named explicitly; three payload-shape table (send-everything / send-what-you-use / per-consumer-tailored); sibling at GraphQL field selection, protobuf FieldMask, columnar projection pushdown, and telemetry cherry-picked instrumentation payload.
concepts/model-signature-as-source-of-truth — the design invariant that makes Send-What-You-Use tractable; module_info.json carries input_names + output_names; version-stability invariant enforced socially ("a new model is forked from the original"); dual-consumer shape (leaf feature converter + root Feature Trimmer) avoids drift.
concepts/network-bound-vs-compute-bound — the scaling-bottleneck framing; Pinterest's canonical datum + bottleneck-relocation post-trimmer; diagnostic signals (utilisation shape, instance-type pressure, fleet-scaling axis, latency-responds-to-compression); sequencing implication — architectural investment should prioritise the currently-binding resource.
patterns/feature-allowlist-over-blocklist — allowlist-over-blocklist in ML-feature-management altitude; Pinterest's reasoning about blocklist-grows-faster-than-allowlist due to experimentation churn + deprecation backlog; fail-closed on new fields as the safe default.
patterns/artifact-rides-model-deploy-pipeline — reuse the model-deploy staged pipeline for trimmer config rather than build a separate control plane; canary → ACA → prod sequencing; root-configs-lead invariant; backward-compatible rolling-deploy artefact carrying both current and pending-version allowlists; contrast with Netflix-style decoupled config via pubsub (Netflix config evolves independently of code cadence; Pinterest trimmer config is strictly derived from model signature so it rides the model pipeline).
patterns/file-watcher-atomic-swap-consolidated-map — on-host refresh mechanism: per-bundle independent maps + file watchers + full consolidated rebuild + atomic swap under RW lock (shared for reads, unique for swap); per-bundle failure isolation so a corrupt bundle doesn't poison others.
patterns/skip-on-missing-allowlist-for-safety — fail-safe posture for Send-What-You-Use: unknown model → untrimmed passthrough; version missing → latest-version fallback; init parse failure → alert but don't block launch. Choice of fail-open over fail-closed because the trimmer is an optimisation on the critical path, not a correctness gate. Companion to Cloudflare's fail-stale preferred-posture ladder.

Shopping conversion candidate generation — two-tower retrieval with parallel DCN+MLP, advertiser-level loss (2026-04-27 conversion-CG post)¶

systems/pinterest-shopping-conversion-cg — Pinterest's dedicated retrieval-stage two-tower model for shopping ads optimised on offsite conversions (checkout, add-to-cart), deployed across Home Feed + Related Pins + Search to 600+ million MAUs. Two-generation journey: 2023 multi-head launch (separate engagement + conversion heads, only conversion head served) → 2025 unified single-head multi-task refresh (patterns/unified-multi-task-over-multi-head) with advertiser-level loss added, parallel DCNv2 + 3-layer MLP cross layers adopted inside both towers. Training-data design: dual positive signal (conversions + click-duration-reweighted clicks/repins) + served-but-not-engaged ad impressions as hard negatives + engagement as auxiliary task for shared-trunk stabilisation. Production wins: +2.3% shopping conversion volume, +2.7% impression-to-conversion rate, +1.5% CTR, +2.2% CTR over 30s (2023 launch); +42% recall@100 for conversion tasks and +3.1% RoAS for US shopping campaigns (2025 refresh); and — separately — +11% offline recall@1000 from the parallel DCN+MLP cross architecture alone, which was subsequently adopted by all Pinterest production engagement retrieval models.
systems/dcnv2 (extended) — second distinct role at Pinterest: parallel cross-layer primitive inside each tower of the shopping conversion CG, contrasted with its earlier role as a projection layer in the ads engagement model. Same building block, two architectural purposes.
systems/graphsage (extended) — subject-Pin visual-graph embedding used as a real-time context feature on the user side of the shopping conversion CG two-tower model. Third distinct Pinterest role for GraphSage: DPP Pin-similarity → SSD Pin-similarity → user-tower retrieval-context feature.
systems/transformer (extended) — user-action-sequence encoder producing user-history embedding on the user side of the shopping conversion CG. Second wiki-confirmed Pinterest Transformer-as-user-history-encoder use (first was the unified engagement model), confirming the pattern across multiple ads-ML models.

PinCompute + Ray ML platform — ENA-reset zombie memcg incident (2026-04-15 CPU bottlenecks post)¶

systems/pinterest-pincompute — Pinterest's Kubernetes-backed general-purpose compute platform on AWS EC2. Zonal-cluster shape (one K8s cluster per AWS AZ) for blast-radius containment + latency locality; AWS Deep Learning AMI (Ubuntu 20.04) as the GPU base image; taints + tolerations used as the reservation mechanism for controlled debugging hosts. Host platform for the Ray-based ML training fleet; target of the 3-month ENA-reset investigation.
systems/ray — distributed-compute substrate for ML training and inference at Pinterest; >50% of offline ML workload runs on Ray; "tens of thousands of Ray clusters per month". The network-sensitivity profile matters: Ray's Control Plane (gRPC health checks, task submission, actor scheduling, ObjectReference maintenance) + Data Plane (high-volume Object Store transfers) are both heavily gRPC-over-TCP with short liveness windows, so ENA driver resets manifest as ObjectFetchTimedOutError, ActorDiedError, node-health-check failures — not as generic network errors, which delayed root-cause attribution.
systems/aws-ena-driver — AWS Elastic Network Adapter Linux driver; the Tx-paused-5 s hardcoded threshold + self-healing device-reset path is the symptom surface that made the incident observable. First wiki canonicalisation of the ENA driver as a production operational primitive.
systems/aws-ecs-agent — Amazon ECS container agent shipped as a default systemd unit in the AWS Deep Learning AMI; crash-loops forever on non-ECS Kubernetes hosts because it has no ECS cluster credentials. Each crash allocates a fresh memory cgroup; deferred kernel reclamation accumulates zombie memcgs at a rate the kernel can't drain. Canonical example of base-image unused-systemd-unit risk.
systems/flamescope — Netflix's temporal-flamegraph visualiser; the time-travel instrument that localised the kubelet's mem_cgroup_nr_lru_pages CPU spike to the seconds immediately before each ENA reset.

Request-level deduplication — Foundation Model + DCAT + Iceberg sort-order (2026-04-13 dedup post)¶

systems/pinterest-foundation-model — Pinterest's recsys foundation model, ACM RecSys 2025 oral spotlight (arXiv 2507.12704). 100× transformer dense parameter growth + 10× model-dimension growth over prior Pinterest ranking models; the scaling driver that forced request-level deduplication to be canonicalised as a cross-cutting discipline. Consumes ~16K-token user-action sequences alongside TransAct.
systems/pinterest-dcat — Deduplicated Cross-Attention Transformer. Pinterest's ranking-transformer architecture that splits the model into a context pass (user sequence once per deduplicated request, KV cached per layer) + a crossing pass (each candidate cross-attends to cached user-history KV). Implemented with custom Triton kernels for training + serving; displaces FlashAttention as ranking-attention substrate. Achieves 2× training gain + 7× serving-throughput gain over standard self-attention with FlashAttention. Canonical wiki instance of patterns/cached-kv-cross-attention-for-deduplication; structural sibling of the LLM-inference KV cache primitive.
systems/apache-iceberg — Iceberg-as-training-dataset-substrate-with-sort-order-as-optimisation. Third distinct Pinterest wiki instance: table-format (2024-05-14 HBase-deprecation), quota-telemetry substrate (2026-02-24 Piqama), ML training data with (user_id, request_id) sort-key for 10–50× columnar compression on user-heavy feature columns. Downstream wins: bucket joins, efficient backfills, incremental feature engineering, stratified user-level sampling. Canonical wiki instance of patterns/sort-by-request-id-for-columnar-compression.

URL normalisation — MIQPS + long-tail parameter learning (2026-04-20 MIQPS post)¶

systems/pinterest-miqps — Minimal Important Query Param Set. Per-domain, per-query-parameter-pattern algorithm + offline job + published config artefact classifying each URL parameter as neutral (safe to strip) or non-neutral (preserve). Uses a visual-content-ID removal test: sample up to S URLs with distinct values for the parameter, render with + without the parameter, classify non-neutral if content IDs differ in ≥T% of samples. Early-exit optimisation stops testing once non-neutral is clear; conservative default flags under-sampled parameters as non-neutral. Anomaly-gated publish with asymmetric rules (non-neutral → neutral flips are anomalies; new non-neutral entries + disappearing patterns are not) protects against degenerate recomputes. Three-phase deployment: continuous ingest → offline compute → runtime lookup. Canonical wiki instance of patterns/per-domain-adaptive-config-learning + patterns/visual-fingerprint-based-parameter-classification + patterns/conservative-anomaly-gated-config-update + patterns/offline-compute-online-lookup-config.
systems/pinterest-url-normalizer — the runtime URL-processing component that loads the MIQPS map at init and does in-memory lookups per URL. Stacks four independent normalisation layers with OR semantics on keep-decisions: static platform allowlists (Shopify variants, Salesforce Commerce Cloud start / sz / prefn1 / prefv1) + regex patterns + MIQPS non-neutral set + conservative default. Parameter kept if any layer preserves it; stripped only if all layers agree. Canonical wiki instance of patterns/multi-layer-normalization-strategy.
systems/pinterest-content-ingestion-pipeline — Pinterest's content-acquisition pipeline from merchant domains. Dual role in MIQPS: upstream producer of per-domain URL corpus (writes each observed URL to S3 continuously as a side effect of normal processing) + downstream consumer of normalised URLs (runs fetch + render + process once per canonical URL rather than per URL variant). Framing: "rendering the same page dozens of times simply because its URLs differ in irrelevant parameters" is the cost driver URL normalisation eliminates. Canonical wiki URL-normalisation use case.

MCP ecosystem — hosted MCP servers + central registry + layered JWT/mesh auth (2026-03-19 MCP ecosystem post)¶

systems/pinterest-mcp-registry — Pinterest's central MCP registry, the source of truth for which MCP servers are approved for production. Dual-surface: Web UI for humans (owning team + support channels + security posture + live status + visible tools), API for AI clients (discover + validate + pre-flight authorize). "Only servers registered here count as approved for use in production."
systems/pinterest-presto-mcp-server — Pinterest's highest-traffic MCP server. Exposes Presto query tools to agents so data flows into agent workflows without dashboard context-switching. Subject to business-group-based access gating — Ads / Finance / specific infra teams only, despite broad surface reachability.
systems/pinterest-spark-mcp-server — underpins Pinterest's AI Spark debugging experience: diagnose Spark job failures, summarise logs, record structured RCAs. Channel-scoped tool visibility — "Spark MCP tools are only available in Airflow support channels."
systems/pinterest-knowledge-mcp-server — general-purpose knowledge endpoint used by Pinterest's internal AI bot for company Q&A + documentation + debugging across internal sources. Sibling shape to Dropbox's Dash MCP — unified-retrieval-tool over institutional knowledge.
systems/model-context-protocol — the protocol Pinterest operationalises at enterprise scale. First canonical wiki instance of the enterprise-SSO piggyback shape — Pinterest explicitly rejects the MCP OAuth spec's per-server consent flow for internal traffic.
systems/envoy — mesh data-plane + JWT validation + identity-header mapping point for all MCP traffic. "Envoy validates the JWT, maps it to X-Forwarded-User, X-Forwarded-Groups, and related headers, and enforces coarse-grained security policies." First wiki datum of Envoy-as-AI-agent-auth-enforcement.

Client-side performance measurement — Android `BaseSurface` + `PerfView` interfaces (2026-04-08 Performance for Everyone post)¶

systems/pinterest-base-surface — Pinterest Android's base UI class every feature screen inherits from. Since 2026 the substrate for automatic Visually Complete measurement: walks the Android view tree from the root, inspects opt-in Perf* marker interfaces, emits a User Perceived Latency timestamp when all visible content-critical views report ready. Canonical wiki instance of patterns/base-class-automatic-instrumentation. 60+ Android surfaces continuously measured with zero per-surface instrumentation cost (down from the pre-platform two engineer-weeks per surface Pinterest disclosed).
systems/pinterest-perf-view — three opt-in marker interfaces (PerfImageView, PerfTextView, PerfVideoView) that product engineers tag content-critical views with; expose isDrawn() / isVideoLoadStarted() plus geometry methods (x() / y() / width() / height()) so the BaseSurface view-tree walk can filter to visible views and conjoin readiness. Canonical wiki instance of patterns/opt-in-performance-interface.
systems/pinterest-android-app — Pinterest's native Android client; deployment target of the 2026 Visually Complete system. Named surfaces in the post: Home Feed, Search Result Feed, Video Pin Closeup, Search Auto Complete.

Home Feed multi-objective optimization (2026-04-07 MOO-evolution post)¶

systems/pinterest-home-feed-blender — Pinterest's Home Feed multi-objective optimization / blending layer. Three generations: V1 (2021, DPP in a backend node chain) → V2 (early 2025, SSD in PyTorch on company-wide model serving cluster) → V2+ (mid/late 2025, unified soft-spacing framework composed into SSD's utility equation for content-quality penalties). Canonical production instance of multi-objective reranking, SSD-over-DPP migration, backend-to-model-server infrastructure migration, and config-based soft-spacing framework.
systems/pinterest-home-feed — the product surface; cascaded-funnel framing now explicit (retrieval → pre-ranking → ranking → multi-objective optimization).
systems/pytorch — serving substrate for SSD + soft-spacing on Pinterest's model serving cluster. Canonical wiki instance of non-ML algorithmic logic riding a general model-serving substrate.
systems/pinclip — Pinterest's multimodal (image-text-aligned, graph-aware) foundational visual embedding; Q3 2025 replacement for prior visual signal in SSD's pairwise similarity. Near-real- time availability for recently-ingested Pins.
systems/graphsage — inductive graph-embedding method used for Pin similarity in both DPP (2021) and SSD (2025) eras.

Analytics Agent + PinCat + Vector DB (2026-03-06 Text-to-SQL post)¶

systems/pinterest-analytics-agent — the #1 agent at Pinterest (10× the next most-used agent, 40% analyst-population coverage in two months, target 50% by year-end). Four-layer architecture: Agent Orchestration (LLM with Pinterest-specific prompts) + MCP Integration (table search + query search + knowledge search + Presto execution)
Context (PinCat schemas + vector indexes + expert docs + query logs) + Execution (Presto with EXPLAIN-before-EXECUTE
bounded retry + default LIMIT 100). Design principles: asset-first, governance-aware ranking, schema-grounded SQL validation, conflict-resolution hierarchy (docs

schema > query patterns > general knowledge).
systems/pinterest-pincat — Pinterest's internal data catalog on DataHub. System of record for table tier tags (Tier 1 / 2 / 3), owners, retention, and column-level glossary terms. The load- bearing substrate the Analytics Agent grounds every SQL query against.
systems/pinterest-vector-db-service — internal Vector Database as a Service on AWS OpenSearch + Hive (source of truth) + Airflow (index creation + ingestion DAGs). JSON-schema config → production vector index in days. Millions of embeddings with daily incremental updates; hybrid semantic-plus-metadata filtering. Canonical wiki instance of patterns/internal-vector-db-as-service.
systems/pinterest-ai-table-documentation — AI-generated table + column descriptions from lineage + existing docs + glossary terms + representative QueryBook queries. Tier-1 human-in-the-loop; Tier-2 LLM-drafts-human-reviews. Paired with join-based glossary term propagation (auto-tagged >40% of columns) + search-based propagation. ~70% total manual-documentation-work reduction.
systems/pinterest-querybook — Pinterest's open-source collaborative SQL editor; origin of the query history indexed by the Analytics Agent.

Ads engagement modeling (2026-03-03 unified model post)¶

systems/pinterest-ads-engagement-model — unified ads engagement / CTR-prediction model consolidating three surface- specific production models (Home Feed, Search, Related Pins) into a single architecture: shared trunk (MMoE + long-user- sequence Transformer) + surface- specific tower trees (HF + SR present; RP future work) + view-type-specific calibration + multi-task heads + surface- specific checkpoint exports. Serving efficiency: DCNv2 projection layer before downstream crossing; fused-kernel embedding + TF32; request-level user-embedding broadcasting with tested-unique-user-cap safety. Load-bearing finding: MMoE + long sequences only paid off when integrated into unified model trained on combined multi-surface data — neither component worked standalone.
systems/pinterest-home-feed / systems/pinterest-search / systems/pinterest-related-pins — three Pinterest ads surfaces with distinct user-intent + feature availability + CUDA throughput profiles. HF + SR unified first (similar throughput); RP deferred until efficiency work stabilised via staged unification.

Quota management platform (2026-02-24 Piqama post)¶

systems/pinterest-piqama — generic quota management ecosystem. REST + Thrift control plane; pluggable schema / validation / dispatch / enforcement hooks; one platform serves both capacity quotas (Moka) and rate-limit quotas (TiDB, KV Stores); feedback loop via Iceberg + Presto + auto-rightsizing service; budget integration with tier-weighted haircut on exceedance.
systems/pinterest-moka — next-gen Big Data Processing Platform on Apache Yunikorn; canonical Piqama capacity-quota integration; per-project Yunikorn queue fed by Piqama quota values via a Yunikorn Config Updater.
systems/apache-yunikorn — open-source resource scheduler underneath Moka.
systems/pinterest-pinconf — Pinterest's config distribution platform; canonical substrate for feature flags + dynamic service config + Piqama rate-limit-rule delivery.
systems/pinterest-spf — Service-Protection Framework; in-process rate-limit + throttling + concurrency-control library that makes local data-path decisions.

Storage substrate (2024-05-14 HBase deprecation post)¶

systems/hbase — Pinterest's default NoSQL store 2013-2021; peak 50 clusters / 9,000 EC2 instances / >6 PB data; deprecated for 5 named reasons (maintenance cost, missing functionality, system complexity, infra cost, waning community).
systems/tidb — Pinterest's chosen post-HBase NewSQL replacement for general-NoSQL workloads requiring transactions + rich query + secondary index. Also a named Piqama rate-limit integration target (2026-02-24 Piqama post).
systems/pinterest-kvstore — Pinterest's in-house KV store on systems/rocksdb + systems/rocksplicator; replaced HBase for KV workloads. Named Piqama rate-limit integration target.
systems/pinterest-zen — Pinterest's graph service (was on HBase; migration target).
systems/pinterest-ixia — Pinterest's indexed datastore built on HBase + Manas realtime.
systems/pinterest-goku — Pinterest's in-house time-series datastore; replaced HBase for time-series workloads.
systems/pinterest-ums — Pinterest's in-house wide-column store.

Key patterns / concepts¶

URL normalisation + content deduplication (2026-04-20 MIQPS post)¶

patterns/per-domain-adaptive-config-learning — hybrid head-curated + long-tail-learned configuration. Static rules for well-known platforms (Shopify, Salesforce Commerce Cloud) + empirical learning (MIQPS) for the long tail of merchant domains. Same structural shape as patterns/head-cache-plus-tail-finetuned-model (Instacart ML layer) applied at the config / rules layer.
patterns/visual-fingerprint-based-parameter-classification — empirical removal-test using a content-ID fingerprint as ground truth. For each parameter, sample URLs, render with/without, compare fingerprints, classify by mismatch rate. Pattern generalises to any "is this component material?" question where content fingerprinting is cheaper than understanding-the-meaning.
patterns/multi-layer-normalization-strategy — combine independent classifiers (static allowlist + regex + MIQPS + conservative default) with OR semantics on keep-decisions. Bias toward the tolerable failure mode: keeping a neutral parameter wastes a render (tolerable); stripping a non-neutral parameter silently merges distinct items (catastrophic). Every layer acts as an independent safety net.
patterns/conservative-anomaly-gated-config-update — publish-time safety gate. Compare new config against previous, count entries that changed in the "dangerous" direction only, reject update if above threshold. Asymmetric rules embody asymmetric costs. Canonical MIQPS instance: non-neutral → neutral flips are anomalies; new non-neutral entries + pattern-disappearances are fine.
patterns/offline-compute-online-lookup-config — three-phase architecture: continuous-ingest to durable corpus + offline batch compute → anomaly-gate → publish artefact → runtime in-memory lookup. Scaling-denominator shift: offline analysis scales with domain count while real-time would scale with URL count (orders of magnitude more expensive for Pinterest's hundreds-of-thousands-of-domains + billions-of-URLs regime).
concepts/url-normalization — collapsing URL variants into one canonical form before expensive downstream work (fetch / render / process / content-dedupe). Upstream of content-identity dedup, which catches duplicates only after paying render cost.
concepts/query-parameter-pattern — sorted set of parameter names in a URL. The grouping key for per-pattern classification, because the same parameter name can play different roles on different page types on the same domain (canonical Pinterest example: ref is neutral on a product page but non-neutral on a comparison page).
concepts/neutral-vs-non-neutral-parameter — the binary per-(domain, pattern, parameter) classification MIQPS assigns. Neutral = safe to strip; non-neutral = must preserve.
concepts/content-id-fingerprint — same-content → same-ID function over rendered pages. Pinterest uses a visual-representation hash; the algorithm is agnostic (DOM tree hashing, response body checksum, <title> + Open Graph metadata also valid).
concepts/canonical-url-unreliability — why <link rel="canonical"> isn't enough across the long tail of merchant domains. Three failure modes: omitted entirely / set incorrectly (homepage canonical default) / contaminated with tracking params. Canonical wiki framing of "trust the data, not the declaration" applied to URL canonicality.
concepts/anomaly-gated-config-update — concept framing for the discipline of comparing newly-computed config against previously-published config before allowing the update, with asymmetric rules that encode the underlying cost asymmetry.
concepts/offline-compute-online-lookup — concept framing for the architectural split between expensive offline analysis and cheap runtime lookup. Acceptable when the underlying phenomenon changes slowly (Pinterest: URL parameter conventions change on the order of weeks to months).

Hosted MCP ecosystem + layered auth (2026-03-19 MCP ecosystem post)¶

patterns/hosted-mcp-ecosystem — the umbrella pattern: central registry + paved-path deployment + domain-decomposed servers + layered auth + owner-supplied time-saved metadata + human-in-the-loop. Pinterest's 66,000-invocations/month / 844-MAU / ~7,000-engineer-hours-saved-per-month ecosystem.
patterns/layered-jwt-plus-mesh-auth — two-layer authorization for AI-agent traffic: end-user JWT validated + header-mapped at Envoy, mesh identity for service-only flows, optional in-process per-tool decorator. Canonical wiki instance of enterprise-SSO-piggyback for MCP.
patterns/unified-mcp-deployment-pipeline — platform-engineering investment in a shared deployment pipeline so authoring an MCP server is business-logic-only. Collapses the per-server infrastructure boilerplate (deployment, scaling, auth wiring, observability, registry listing) into platform-handled concerns.
patterns/per-tool-authorization-decorator — @authorize_tool(policy='…') in-process decorator for fine-grained per-tool authorization, layered over coarse transport-level auth. Pinterest example: get_revenue_metrics callable only by Ads-eng groups even when the server is broadly reachable.
patterns/central-proxy-choke-point (extended) — Pinterest's Registry + Envoy together form an enterprise-internal MCP choke point. Differs from Cloudflare/Databricks external-LLM-API choke-points on two axes: internal-employee traffic (no upstream-credential-injection job) + policy-surface-via-registry-backed-review-outcomes (not a dashboard flag).
concepts/mcp-registry — organisation-internal authoritative catalog of approved MCP servers with dual human/agent surfaces and pre-flight authorization API. Sibling to MCP Server Card (per-server public) + concepts/api-catalog (HTTP-API public).
concepts/hosted-vs-local-mcp-server — the deliberate architectural-choice axis; Pinterest's "paved path" statement of hosted-first for production, local for experimentation.
concepts/business-group-authorization-gating — narrow the authenticated-user population at session-establishment time by business-group membership claims. Solves the wide-surface × data-heavy-server blast-radius problem without moving the server off the popular surface.
concepts/elicitation-gate (extended) — Pinterest's MCP-primitive + agent-guidance implementation contrasts with Cloudflare Agent Lee's Durable-Object-proxy implementation. Introduces batch approval as a legitimate HITL-cost-reduction mechanism.
systems/envoy (extended) — first wiki datum of Envoy as JWT validation + identity-header mapping for AI-agent traffic; coarse-grained policy enforcement for server-reachability.

Shopping conversion candidate generation (2026-04-27 conversion-CG post)¶

patterns/parallel-dcn-mlp-cross-layers — put DCNv2 cross network and an MLP deep network in parallel on the same raw input inside each tower, concatenate outputs before the head — avoids the information bottleneck of sequential DCNv2 → MLP stacking. Validated on shopping conversion CG with +11% offline recall@1000; subsequently adopted by all Pinterest production engagement retrieval models. Canonical wiki instance.
patterns/dual-positive-signal-for-sparse-labels — supplement sparse primary-task positives (conversions) with a denser auxiliary signal (clicks, repins) in the same positive set; reweight the auxiliary by a quality proxy to prevent noise from dominating. Pinterest uses w = f(log(1 + t / t_max)) click-duration reweighting. Canonical wiki instance.
patterns/unified-multi-task-over-multi-head — migrate from a multi-head architecture (separate task heads, only one head served) to a unified single-head multi-task architecture when per-head embeddings become unstable in sparse-label regions. Canonical Pinterest 2023-multi-head → 2025-unified migration.
patterns/auxiliary-engagement-task-for-conversion-retrieval — train engagement prediction as auxiliary task alongside conversion as primary task, sharing encoders; engagement's abundant gradient stabilises the shared trunk. "The crucial challenge is balancing the two tasks, ensuring the high-value conversion signal is not diluted by the more frequent engagement data."
concepts/shopping-conversion-candidate-generation — the retrieval-stage ads primitive decomposition: separate conversion-optimised retriever from engagement retriever to avoid under-weighting sparse conversion signal.
concepts/offsite-conversion-sparsity — the structural training-data regime: sparse + noisy + advertiser-reported + delayed. Parent failure mode that motivates the full stack of conversion-CG design choices.
concepts/parallel-cross-and-deep-network — concept framing for parallel-vs-sequential cross+deep composition; the information-bottleneck argument.
concepts/click-duration-reweighting — log-based per-click weight function (w = f(log(1 + t / t_max))) that converts noisy binary click labels into continuous-weight positives; bounce clicks weight ≈ 0, dwell-time-confirmed engagement weights saturate at t_max. Canonical wiki instance.
concepts/advertiser-level-loss — parallel training objective at advertiser granularity to reduce per-Pin conversion variance; canonical wiki instance. Stabilises training without abandoning per-item scoring.
concepts/ad-impression-as-hard-negative — served-but-not-engaged ad impressions as hard negatives stacked on top of in-batch negatives; reflects real served-distribution rather than random-catalog baseline.
concepts/auxiliary-task-regularization — parent concept: using abundant auxiliary signal to regularise a shared representation trained on sparse primary labels.

Client-side performance platform (2026-04-08 Performance for Everyone post)¶

patterns/base-class-automatic-instrumentation — build measurement logic into the UI-framework base class every screen inherits from. Canonical wiki instance is Pinterest's Android BaseSurface. Collapses two-engineer-weeks-per-surface instrumentation cost to zero-per-surface at platform scale.
patterns/view-tree-walk-for-readiness-detection — iterate the UI element tree from the root, filter to visible views via geometry, conjoin per-view readiness through a uniform interface. How Pinterest's BaseSurface decides per-surface Visually Complete without per-surface code.
patterns/opt-in-performance-interface — product engineers tag content-critical views by implementing a small marker-plus-readiness interface (PerfImageView / PerfTextView / PerfVideoView); platform walks the tree and consumes. Solves the auto-detect-false-positives problem.
concepts/user-perceived-latency — time from user action until the user sees the content; Pinterest's product-level latency contract ("performance is the default feature"). Operationalised as concepts/visually-complete.
concepts/visually-complete — the per-surface operational predicate for User Perceived Latency. Canonical worked examples from the post: Video Pin Closeup = "full-screen video starts playing"; Home Feed = "all images rendered and videos playing"; Search Auto Complete = "search suggestions' text rendered along with avatar images."
concepts/client-side-performance-instrumentation — the broader category of in-app measurement; distinct from server-side observability because layout + decoding + buffering + hydration happen after the last server response.
concepts/instrumentation-engineering-cost — Pinterest's two-engineer-weeks-per-surface datum; the forcing function for the platform investment.
concepts/view-tree-traversal — walking a hierarchical UI element tree to compute a derived predicate; the substrate Pinterest's base class operates on.
concepts/base-class-instrumentation — inheritance-based realisation of cross-cutting instrumentation; collapses N-component cost to one-platform cost.
concepts/opt-in-marker-interface — interface implementation as opt-in declaration for cross-cutting framework behaviour; the language-level mechanism underneath PerfView.

Home Feed multi-objective optimization (2026-04-07 MOO-evolution post)¶

patterns/multi-objective-reranking-layer — dedicated final funnel stage for slate composition; canonical wiki instance is Pinterest Home Feed Blender.
patterns/ssd-over-dpp-diversification — algorithm-migration pattern: swap DPP for SSD to gain PyTorch-native implementation + lower serving latency + signal-expansion capacity.
patterns/blending-logic-to-model-server — infrastructure- migration pattern: move feed-blending heuristics from backend service code to PyTorch-hosted components on company-wide model serving cluster for iteration velocity, local testability, and unified feature plumbing.
patterns/config-based-soft-spacing-framework — declarative configuration of sensitive-content classes and soft-spacing penalties; abstract single-class implementation into a platform as quality-axes grow.
patterns/multi-signal-pairwise-similarity — compose visual + text + graph + Semantic-ID signals into one similarity substrate; signal-expansion from (GraphSage + taxonomy) to (PinCLIP + text + GraphSage + Semantic ID) across 2021-Q4-2025.
concepts/feed-diversification — slate-level reranking for topic/style variety; a long-term engagement lever. Canonical Pinterest ablation datum: >2% time-spent-impression drop week 1.
concepts/determinantal-point-process — DPP algorithm parametrized over relevance diagonal + similarity off-diagonal. Pinterest Home Feed V1 (2021-2024).
concepts/sliding-spectrum-decomposition — position-adaptive windowed spectral decomposition. Pinterest Home Feed V2 (2025 →).
concepts/soft-spacing-penalty — distance-weighted penalty on clustered sensitive-class content; graceful alternative to hard filtering.
concepts/semantic-id — hierarchical discrete content representation via coarse-to-fine quantisation; prefix-overlap penalty for stable category-like anti-clustering (Q4 2025).
concepts/feed-level-reranking — canonical stage-level framing of slate-composition reranking distinct from pointwise ranking. Extends retrieval → ranking funnel with a third production stage.
concepts/position-adaptive-diversification — diversification whose decisions condition on already-placed items (vs slate- global). SSD is the canonical production instance.
concepts/short-term-vs-long-term-engagement — canonical trade-off surfaced by the DPP-ablation study (day-1 engagement gain → week-2 negative retention).
concepts/quality-penalty-signal — classifier output flagging elevated-risk content for soft-penalty treatment (not hard filter). Consumed by concepts/soft-spacing-penalty.
concepts/exposure-bias-ml — closed-loop feedback mechanism behind the short-term-vs-long-term divergence: less-diverse content creates less-diverse engagement signals, training subsequent rankers on biased distributions, collapsing variety further. Extended in this post with the chronic-equilibrium variant (vs the acute A/B-window variant from the L1 CVR post).

Text-to-SQL / Analytics Agent (2026-03-06 unified context-intent embeddings post)¶

patterns/sql-to-intent-encoding-pipeline — three-step pipeline (domain context injection → SQL-to-text → text-to-embedding) that converts historical SQL into a semantically searchable intent index.
patterns/analytical-intent-retrieval — at query time, embed the user's natural-language question and retrieve by similarity to analytical-question descriptions of past queries, not to table docs.
patterns/governance-tier-ranking-fusion — fuse semantic-similarity scores with governance metadata (tier, freshness, docs, ownership) when ranking retrieval candidates. Canonical wiki instance.
patterns/explain-before-execute-validation — EXPLAIN-first LLM SQL validation with bounded retry + default LIMIT 100. Canonical wiki instance.
patterns/internal-vector-db-as-service — platform pattern: OpenSearch + Hive + Airflow platform lets teams go zero-to-index in days. Canonical wiki instance.
concepts/text-to-sql — canonical wiki definition of the task + why naive schema-dump RAG breaks at 100K+ tables.
concepts/unified-context-intent-embedding — Pinterest's named contribution; single embedding space over natural-language descriptions of what historical queries were designed to answer.
concepts/analytical-intent-embedding — vector representation of the business question a query was designed to answer.
concepts/sql-to-text-transformation — the LLM step producing three outputs per query (summary / analytical questions / detailed breakdown).
concepts/analytical-question-bridge — the load-bearing indirection: match user questions to "analytical questions this query answers" rather than to table descriptions.
concepts/governance-aware-ranking — fuse similarity scores with trust signals so trustworthy tables rank above deprecated/undocumented ones.
concepts/glossary-term-propagation — propagate a well-documented column's glossary term to undocumented columns it joins to; auto- tagged >40% of Pinterest columns.
concepts/data-governance-tiering — Tier 1 / 2 / 3 discipline separating production / team-owned / legacy tables, enforced via PinCat.
concepts/asset-first-agent-design — surface existing trusted assets before generating new SQL; design principle favoring reuse + consistency.
concepts/query-history-knowledge-base — treat accumulated SQL query logs as a durable, searchable library of expert-authored analytical solutions. 2,500+ analysts continuously teach the system.

Ads ranking / model unification (2026-03-03 post)¶

patterns/unified-multi-surface-model — one shared model, multiple product surfaces, shared representation; canonical wiki instance is Pinterest Ads Engagement Model.
patterns/surface-specific-tower-tree — surface-routed subnetworks inside a unified model with late-fusion surface- specific modules.
patterns/surface-specific-checkpoint-export — N checkpoints from one joint training run, one per surface.
patterns/request-level-user-embedding-broadcast — fetch heavy user embeddings once per unique user per batch, broadcast to original candidate-request layout; tested-unique-user-cap as operational safety.
patterns/staged-model-unification — sequence unification by CUDA throughput profile.
concepts/surface-specific-calibration — per-traffic- distribution calibration heads instead of a single shared one.
concepts/projection-layer-for-latency — DCNv2 bridge shrinking Transformer output width before downstream crossing.
concepts/request-level-embedding-broadcast — deduplicate entity lookups + broadcast to batch layout; concept form.
concepts/cuda-throughput-budget — GPU throughput profile as the cost axis for unification sequencing.
concepts/ctr-prediction — core scoring primitive.
concepts/multi-task-learning — task-head framing.
concepts/long-user-sequence-modeling — Transformer over user action history as feature encoder.
concepts/mixture-of-experts — MMoE as the expert-routing trunk component.

Quota management (2026-02-24 Piqama post)¶

patterns/generic-quota-management-platform — one control plane with pluggable hooks for capacity + rate-limit + app-specific quotas.
patterns/async-centralized-quota-local-enforcement — central rule lifecycle + async distribution + in-process local rate-limit decisions.
patterns/historical-usage-auto-rightsizing — feedback loop via Iceberg + Presto + separate rightsizing service.
patterns/config-distribution-for-quota-rules — treat quota rules as dynamic config; ride existing PinConf substrate.
patterns/budget-enforced-quota-throttle — tier-weighted haircut on budget exceedance.
concepts/quota-lifecycle-management — full CRUD + authorization + validation + dispatch + enforcement + feedback surfaces.
concepts/capacity-vs-rate-limit-quota — two structurally different quota kinds unified under one control plane.
concepts/quota-auto-rightsizing — periodic adjustment from historical usage.
concepts/declarative-quota-rule — structured rules enabling validation + tooling + automation.
concepts/local-rate-limit-decision — request-time enforcement in-process rather than via global service.
concepts/entitlement-budget-quota-integration — three-layer governance: dollars → entitlements → quotas.
concepts/pluggable-validation-framework — custom schema + semantic + remote-service validation hooks.

Storage deprecation (2024-05-14 HBase post)¶

patterns/nosql-to-newsql-deprecation — five-reason framework for retiring a load-bearing NoSQL store.
patterns/primary-standby-wal-replication — two-cluster deployment with WAL replication + cluster-level failover.
patterns/workload-specific-datastore-migration — decompose workloads by access pattern (OLAP / time-series / KV / transactional) and rehome each to a purpose-built store.
concepts/wal-replication — inter-cluster WAL-shipping replication.
concepts/primary-standby-failover — two-cluster online/offline deployment with cluster-level failover.
concepts/replica-cost-tradeoff — 6-replica primary-standby vs 3-replica single-cluster durability / availability / cost trade-off.
concepts/tech-debt-compounding — version lag → painful upgrade → more lag vicious cycle.

Recent articles¶

2026-05-21 — Making User-Sequence Data More Cost-Efficient, Faster, and Easier to Use. Pinterest Ads Feature Engineering Infra + Core ML Infra + ML Data + User Understanding teams (Ajay Venkatakrishnan, Le Zhang, Eric Shang, Pihui Wei, Connor Votroubek, Yi He, Camilo Munoz, Simin Li) document a platform-level redesign of the user-sequence data substrate that powers Pinterest's ranking, retrieval, and recommendation models — the same ~16K-token user-action sequences fed into Pinterest Foundation Model and TransAct. Definitional reframe: a user event sequence is "an ordered list of recent, relevant events for a user, along with the enrichments (signals) attached to each event" — embeddings, contextual features, derived attributes — produced by a four-stage pipeline (ingest → filter → enrich → assemble) rather than "the N latest events from a log table". Three workload types consume sequences: training datasets (offline batch reads of long history), offline analysis (data-scientist queries), online inference (real-time fetch at request time for ranking / retrieval). Sequence quality is multi-dimensional — freshness + completeness + consistent enrichment + stable schemas — with multi-tenancy forcing correctness, observability, and operability into first-class status alongside throughput / latency. Organising principle: one definition, many runtimes — the structural cure for the split-brain failure mode "where training pipelines build sequences one way from batch tables while serving systems assemble sequences a different way from online stores. Over time, those two views naturally drift apart in subtle ways." This is the substrate-side complement to the model-side O/O discrepancy diagnosis from 2026-02-27: O/O gets debugged at the model layer when the substrate isn't structurally aligned, and gets prevented at the substrate layer when it is. The new platform has six components: streaming + batch ingestion, shared enrichment + execution layer, real-time indexer, batch indexer + backfill pipeline, columnar time-partitioned storage, online serving API. From the consumer's perspective these collapse to one contract: "Request sequence X for user U, and you'll get a well-defined schema of enriched events, with a documented freshness and completeness profile." Four design decisions plus a migration discipline: (1) Configuration-as-code — Python configs with a defined schema describing sequence-feature, event-type, and enrichment definitions, validated and compiled to portable JSON in managed object storage, consumed by streaming + batch + serving runtimes. "New event types or enrichments can now be added primarily through configuration, plus small, isolated pieces of code where absolutely necessary, instead of via entirely new pipelines." Diff-friendly + reviewable + rollback-able + version-controlled in standard VCS. Clear separation: ML/product teams own what (events, features, filters); platform team owns how (reliable + efficient execution). (2) Shared execution engine + pluggable executors — framework owns data-source / sink wiring, concurrency, retries, backpressure, configuration parsing + validation; executors own per-event-type filtering + featurisation + raw → normalised mapping. "In plain terms, the executor is the 'business logic module' for a particular event type or grouping, while the execution engine handles everything around it." The same engine + same executors run in streaming and batch — collapses the historical lambda-architecture dual-code-path tax (Kreps's "Questioning the Lambda Architecture" objection) to "two scheduling shapes of one code path". (3) Lambda architecture for fresh + complete sequences — streaming path owns "now" (low-latency online inference view); batch path owns "fixing history" (late events, enrichment corrections, backfills, long sequences for training + analysis). "The two paths cooperate instead of competing." Made cheap by the shared executor logic. (4) Columnar time-partitioned storage with table semantics — replaces pre-redesign "large, consolidated 'enriched event' blobs" where "every online call or offline scan had to pull the whole payload — even if a model only needed a small subset of features." Each enrichment / feature → its own column; reads project only required columns; time-bucket partitioning bounds writes + scans as history grows; engineers query with familiar SQL / DataFrame abstractions for "comparing runs, versions, or backfill strategies by inspecting partitions". Adding a new enrichment becomes a column-add rather than a payload-shape change. Migration discipline: event-type-by-event-type shadow cutover — for each event type, run new pipeline parallel with legacy, generate shadow sequences, compare with two-tier validation (event-level field-by-field on matched events + sequence-level shadow vs legacy output), gate cutover on A/B experiments using new-data sequences, controlled cutover, iterate to next event type, deprecate legacy incrementally. Pinterest is explicit that 100% match is not the goal — "the data won't have a 100% match due to the nature of our online systems… approximately the same sequences when compared to the legacy system." Per-event-type granularity (not per-pipeline) bounds blast radius for a multi-tenant platform with many event types × many tenants. Operational readiness as a first-class workstream: dashboards for sequence freshness + lag, event + enrichment coverage, schema drift + configuration rollout status, serving latency + error rates — each axis maps directly to one of the four sequence-quality dimensions. Outcomes (qualitative per company policy): "significant infrastructure cost reductions" on storage / replication / network as event types migrated; onboarding time for new enrichments + event types "dropped substantially" (mostly config + small executor changes); "improved engagement metrics" on major recommendation surfaces post-migration. Future work: self-serve tooling (signal wizards, configuration static analysis, automated backfill orchestration); stronger correctness (anomaly detection over indexing + serving paths); richer signals (more event types + surfaces; session-level + object-level abstractions on top of raw event sequences) while preserving the "events → enriched signals → sequences" contract that keeps the platform coherent. Goals + non-goals are explicit: in scope = consistent contract / cost efficiency / fast + safe onboarding / batch + real-time parity; out of scope = downstream models or ranking architectures ("the focus is on the platform that feeds them") and the product definition of events ("those semantics remain owned by product and logging teams"). Canonical wiki contributions: first canonicalisation of the Pinterest user-sequence platform as a named system; first canonical wiki concept for user event sequence as a four-stage pipeline-output primitive; first canonical wiki concept for one-definition-many-runtimes as a platform organising principle; first canonical wiki concept for sequence quality dimensions as a multi-dimensional contract for multi-tenant ML data substrates; first canonical wiki concept for enrichment execution engine as the engine + pluggable-executor architectural primitive; new patterns — configuration-as-code feature pipeline, shared execution engine + pluggable executors, lambda architecture for fresh + complete sequences, columnar time-partitioned feature storage, event-type-by-event-type shadow cutover; online-offline discrepancy gains its second canonical instance — substrate-side prevention complementing the model-side diagnosis. Sibling to the sources/2026-04-13-pinterest-scaling-recommendation-systems-with-request-level-deduplication|2026-04-13 request-level deduplication post on a different axis of the user-sequence stack: this post is the substrate-platform redesign that produces the sequences; the dedup post is the model + lifecycle redesign that consumes them — the user-sequence platform is structurally upstream of the Foundation Model's 100× scaleup absorbed via request-level deduplication. Sibling to the sources/2026-05-08-pinterest-enhancing-ad-relevance-integrating-real-time-context-into-sequential-recommender-models|2026-05-08 contextual sequential CG post at a different stack level: that post is a model-side change to the user tower (context layer + hybrid offline/online inference); this post is the substrate-side platform that produces the offsite-conversion-history user sequences the Transformer encoder consumes. Caveats: no quantitative numbers (per company policy — cost %, onboarding-time, engagement deltas all withheld); no architecture diagrams reproduced (figures named in prose only); no latency / throughput SLOs disclosed; specific tooling unnamed (compute substrate, columnar format, object-storage substrate, serving substrate); schema versioning policy undocumented; multi-tenancy implementation undisclosed; lambda-merge semantics not specified; shadow tolerance bands not quantified; migration scope undisclosed; self-serve tooling deferred to future work; no security / governance content.
2026-05-08 — Enhancing Ad Relevance: Integrating Real-Time Context into Sequential Recommender Models. Pinterest Ads Vertical Modeling (Huiqin Xin, Lakshmi Manoharan, Karthik Jayasurya, Ziwei Guo, Alina Liviniuk) document the Contextual Sequential Two-Tower Model — a contextual evolution of Pinterest's prior offsite-conversion-history Transformer-based CG that adds real-time on-Pinterest context (the subject Pin the user is currently viewing) into the user tower at request time. Motivation is structural: the prior CG produced user embeddings purely "offline from offsite history" with no awareness of the current session, leaving its candidates uncompetitive on contextual surfaces — "less than 1% of impressions on Related Pins were attributed to this CG" before the change. The diagnostic is candidate survival rate, not recall: the CG was retrieving candidates the funnel kept dropping. Three integrated pieces fix this: (1) context layer added to the user tower consuming subject-Pin interest-category embeddings (weighted by confidence) + user demographics; (2) synthetic pseudo-context augmentation — pseudo-context derived from the positive label at training time so the model learns to use context, paired with high dropout on the context layer to prevent over-reliance on the leaked signal and preserve the historical-sequence signal; (3) hybrid offline/online user tower inference — the cost-heavy Transformer encoder runs offline (last hidden state cached daily in the feature store), while the context layer + final MLP head run online at ad-request time, fusing the cached offline state with real-time subject-Pin features. Pinterest considered using real onsite context training data merged with offsite history but rejected it: "(1) Merging onsite data with offsite data presents significant technical difficulties. (2) We cannot guarantee that a user has viewed ad impressions on Related Pins between two sequential offsite events." Production wins on Related Pins: 3x–10x Recall@K (offline), ~275–300% lift in median candidate relevance, +1.08% ads relevance overall, 2x more candidates retrieved delivered to impression, ~0.7% ROAS lift (~1.4% in top revenue countries). Future work proposed: context surface expansion to Search (where the search query becomes the context-layer input) and cross-attention fusion replacing concatenation — "the context layer embedding acts as the query and the sequence of encoded transformer outputs serves as the key/value" — for context-conditional re-weighting of history. Sibling to the 2026-04-27 shopping conversion CG post on a different lineage — both target offsite conversions with two-tower retrieval, but shopping CG is parallel-DCN+MLP + multi-task + advertiser-loss; this CG is sequential-Transformer + context-layer + hybrid-inference. Canonical wiki contributions: first canonicalisation of the context-layer-in-two-tower tower-internal primitive; first canonicalisation of hybrid tower inference split as the structural answer to "heavy historical encoder + real-time context features in the same user tower"; first canonicalisation of pseudo-context augmentation + companion high-dropout-on-augmented-feature-layer regularisation as the training-serving-parity solution for request-time-only features; first canonicalisation of candidate survival rate as the CG funnel diagnostic; first canonicalisation of subject Pin as a Pinterest-specific (but generalisable) request-time intent-feature concept. Caveats: hyperparameters undisclosed (Transformer topology, sequence length, context-layer dimensions, dropout rate); daily-refresh staleness impact unquantified; pseudo-context projection function unspecified; latency / compute envelope undisclosed; survival-rate-to-revenue ratio (2x candidate delivery → 0.7% ROAS) suggests the CG started from a low absolute floor on Related Pins.
2026-05-01 — Optimizing ML Workload Network Efficiency (Part I): Feature Trimmer. Guangtong Bai, Shantam Shorewala, Chi Zhang, Neha Upadhyay, and Haoyang Li (Pinterest Product ML Infrastructure + AI Platform) canonicalise the Feature Trimmer system that unblocked fleet downsizing on Pinterest's online-ML-serving root-leaf architecture by eliminating unused features from the per-request fan-out payload between CPU-bound root hosts and GPU-bound leaf partitions. Architectural framing canonicalises network-bound-not-compute-bound as Pinterest's pre-trimmer state: "we had to scale the system based on network usage rather than compute", with root hosts forced onto expensive AWS m6in network-optimised instance type and leaf-partition peak network significantly higher than peak GPU SM activity. Two-lever remedy: (1) fbthrift lz4 compression (the weaker lever) delivered −20% root-leaf bandwidth at +5% CPU + 5 ms (~10%) p90 latency; Pinterest's explicit framing — "a solid early win, but it didn't change the underlying problem: we were still shipping too much unused data." Canonical compression codec trade-off data point at RPC altitude, extending the wiki's prior Kafka-family instances into ML serving. (2) Send What You Use (the structural lever) — the root trims each fan-out RPC to exactly the feature allowlist that the destination leaf model actually consumes, sourced from the PyTorch model signature in module_info.json inside the .pt archive. Pinterest's explicit C++ IWYU analogy: "similar to C++'s 'include what you use' header management tool removing unnecessary #include's, we could potentially cut root-leaf network usage by ~50%." Allowlist beats blocklist for ML-feature management because the blocklist "is significantly larger for any given model and it is probable that it will grow faster than the allowlist in the future" — the ML feature universe monotonically grows with experimentation + deprecation churn. Deploy integration (patterns/artifact-rides-model-deploy-pipeline): per-bundle module_info.json aggregates into a per-bundle artefact that rides the existing staged model-deploy pipeline (canary → ACA → prod → rollback), with root configs deployed ahead of model configs so the allowlist is already in place when new leaf model versions arrive, and with a backward-compatible artefact carrying allowlists for both current and pending versions during rolling deploys. On-host mechanism (patterns/file-watcher-atomic-swap-consolidated-map): per-root-host module maintains per-bundle independent maps refreshed by file watchers, rebuilds and atomically swaps the consolidated model_name → version → allowlist map under a read-write lock (shared for reads, unique for swap); per-bundle isolation so a corrupt bundle doesn't poison others' updates. Safety (patterns/skip-on-missing-allowlist-for-safety): unknown model → untrimmed passthrough; version missing → latest-version fallback (works because signatures are version-stable — "a new model is forked from the original"); init parse failure → alert but don't block launch "to preserve our ability to respond to capacity-related incidents". Fail-open rather than fail-closed because the trimmer is an optimisation on the critical path, not a correctness gate. Production impact: Ads root cluster network 4 GBPS → <1.5 GBPS peak, fleet −27%; Ads leaf peak network 1000–1200 MBPS → <200 MBPS; Homefeed root outbound 1.2–2.1 → 0.45–1.1 GB/s, fleet −33%; Homefeed leaf inbound −65–75%; Ads AdMixer client p90 >90 ms → <80 ms peak; Related Pins p99 130–180 ms → 95–125 ms (−25–30%); Search / Notification egress −45% / −65% enabling instance-type downgrade with ≥30% cost reduction on both, $0.98M/yr saved on Search + Notification rightsizing alone; $4M+/yr total savings + +0.17% revenue lift from fewer timeout failures. End-state framing: "It effectively shifted the bottleneck from network to CPU cycles on the root cluster" — canonical bottleneck-relocation story. Canonical load-bearing wiki contributions: (1) first canonicalisation of Pinterest's ML-serving root-leaf architecture as a named system; (2) first wiki canonicalisation of Feature Trimmer + Send-What-You-Use principle + Model-signature-as-source-of-truth; (3) first wiki canonicalisation of fbthrift as Pinterest's RPC substrate; (4) first canonicalisation of feature-allowlist-over-blocklist at ML-feature-management altitude; (5) first canonicalisation of network-bound-vs-compute-bound as a first-class diagnostic concept with the bottleneck-relocation dynamic named; (6) new patterns — artifact-rides-model-deploy-pipeline + file-watcher-atomic-swap-consolidated-map + skip-on-missing-allowlist-for-safety; (7) extension of concepts/compression-codec-tradeoff to the RPC-altitude fbthrift-lz4 data point, complementing the Kafka-family lz4/zstd anchors. Sibling to the 2026-05-01 Netflix State of Routing in Model Serving post — same altitude (client-facing ML serving), same day, different decomposition axis: Netflix solves "which backend should handle this request?" via Lightbulb+Envoy routing split; Pinterest solves "what bytes should this request carry to its known backend?" via payload trimming. Both disclose the structural hazards of multi-model multi-surface production ML serving. Infrastructure-side complement to Pinterest's 2026-03-03 unified Ads engagement model post — that post describes the model-side unification of surface-specific engagement models; this post describes the infrastructure-side payload optimisation that makes the serving tier affordable at scale. Caveats: no measured trim-ratio distribution (the ~50% figure is theoretical); no trimmer-specific runtime miss-rate observability described; the version-stability invariant is a social convention, not a tooling check; Part II will address the client→root payload (a larger absolute payload carrying per-candidate features pre-fan-out).
2026-04-27 — From Clicks to Conversions: Architecting Shopping Conversion Candidate Generation. Pinterest Ads ML (Richard Huang, Yu Liu, Ziwei Guo, Andy Mao, Supeng Ge) two-generation retrospective on the shopping conversion candidate generation model — Pinterest's dedicated retrieval-stage two-tower model optimised for offsite shopping conversions (checkout, add-to-cart) rather than onsite engagement, deployed across Home Feed + Related Pins + Search to 600+ million MAUs. Motivated by conversion-label sparsity: "Because they occur offsite, conversion events are significantly sparser and noisier than onsite engagement signals." 2023 launch used a multi-head architecture (engagement + conversion heads, only conversion head served at inference) with a dual positive signal mixing conversions with w = f(log(1 + t / t_max)) click-duration-reweighted clicks and repins to broaden coverage, and served-but-not-engaged ad impressions as hard negatives on top of in-batch negatives. 2025 refresh migrated to a unified single-head multi-task architecture — "to better stabilize query embeddings in regions of low conversion coverage" — and added an advertiser-level loss as additional training objective to address "high variance" of per-Pin conversion supervision. Architectural headline: DCNv2 + 3-layer MLP cross layers]] inside both towers replaced sequential DCNv2→MLP stacking, eliminating the "information bottleneck" where "the MLP could only learn from features already processed by DCN v2, potentially losing valuable signals from the original input" — delivered +11% offline recall@1000 on conversion, then adopted by all Pinterest production engagement retrieval models. User-side features: real-time context (subject Pin visual embedding + GraphSage Pin-graph embedding) + preference/historical features (demographics + aggregated history + Transformer-encoded user action sequence). Pin-side: ID + multi-modal content + performance features. Production numbers (Pinterest Internal Data, US, 2023-2025): +2.3% shopping conversion volume and +2.7% impression-to-conversion rate at 2023 launch; +1.5% CTR and +2.2% CTR-over-30s as byproducts; +42% recall@100 for conversion tasks from the 2025 refresh; +3.1% RoAS for US shopping campaigns. Load-bearing quote on MTL balancing: "The crucial challenge is balancing the two tasks, ensuring the high-value conversion signal is not diluted by the more frequent engagement data." Sibling to the 2026-03-03 ads engagement model post on a different funnel stage: this post is retrieval-stage conversion-optimised CG; that post is ranking-stage multi-surface engagement prediction. Same core primitives (shared trunk, DCNv2, Transformer user-history, multi-task heads) used in different architectural roles — DCNv2 as parallel-cross primitive vs projection layer; multi-task heads as unified-over-multi-head vs per-surface heads with tower trees.
2026-04-15 — Finding zombies in our systems: A real-world story of CPU bottlenecks. Pinterest PinCompute (Kubernetes platform) + ML Platform teams' 3-month joint production incident retrospective. Ray-based distributed ML training jobs on GPU EC2 hosts crashed intermittently with "loss of network connectivity" errors that surfaced in kernel logs as AWS ENA driver Tx-paused-5 s device resets. The debugging journey: aggregate perf was useless on 96-vCPU machines; per-core mpstat -P ALL 1 revealed core 39 at 100% %sys for multiple seconds; a patterns/continuous-perf-record-for-time-travel|12-hour continuous perf record bash loop on a reserved-host repro env + Netflix's Flamescope time-travel view caught kubelet burning 6.5% of total CPU on mem_cgroup_nr_lru_pages a few seconds before each reset. /proc/cgroups showed 68,680 tracked memory cgroups vs 240 actually in /sys/fs/cgroup/memory/ — ~70,000 zombie memcgs leaked by a crash-looping ecs-agent systemd unit shipped by the AWS Deep Learning AMI (a base-image default-systemd-unit risk — Pinterest runs Kubernetes, the ECS agent has no cluster credentials, systemd restarts forever). Canonical CPU-starvation-induced network reset chain: zombie-memcg buildup → kubelet syscall iterates them → one CPU core pinned at 100% %sys → ENA driver Tx thread starved beyond 5 s → device reset → packet drops → Ray training crashes. AZ-disparity confession: one AZ was spared because an unrelated Kubernetes-binary-delivery bug caused the node bootstrap script to fail, gating the ecs-agent systemd unit from starting (two bugs accidentally cancelling — the "look closer at identical environments" meta-lesson). Fix: disable the ecs-agent systemd unit in the base image and reboot to purge zombie memcgs. Six mitigations that didn't work (TransparentHugePages, jemalloc, taskset CPU affinity, interrupt pinning, per-cluster configuration tweaks, host reboots) catalogued as diagnostic-by-negative. Key takeaways: aggregate CPU metrics hide single-core saturation at high vCPU counts (concepts/per-core-cpu-visibility); temporal profiling is the right instrument for sporadic events (concepts/temporal-profiling); ENA resets are a CPU-starvation symptom, not a network bug; reboot-fixes-it-for-a-week fingerprint points at accumulated kernel state; reproducible closed debugging environments unlock otherwise-intractable investigations; fleet-wide transient-metric collection is the ops prereq that made AZ correlation visible in the first place.
2026-04-20 — Smarter URL Normalization at Scale: How MIQPS Powers Content Deduplication. Pinterest Content Acquisition and Media Platform (Shanhai Liao, Di Ruan, Evan Li) introduce MIQPS (Minimal Important Query Param Set) — a per-domain, per-query-parameter-pattern algorithm that learns which URL parameters affect content identity via a visual-content-ID removal test: sample up to S URLs with distinct parameter values, render the page with and without the parameter, classify non-neutral if content IDs differ in ≥T% of samples. Motivation: Pinterest's content ingestion pipeline otherwise wastes render capacity on URL variants that resolve to the same content (tracking params, session tokens); content-identity dedup catches them downstream but only after the render cost. Static allowlists cover known platforms (Shopify variants, Salesforce Commerce Cloud start / sz / prefn1 / prefv1) but "URL parameter conventions vary wildly" across the long tail. Per-pattern (not per-parameter-name-global) keying is load-bearing — canonical example: ref is neutral on a product page URL but non-neutral on a comparison page URL. <link rel="canonical"> is unreliable across the long tail (omitted / misconfigured / contaminated). Three-phase architecture (patterns/offline-compute-online-lookup-config): continuous ingest writes per-domain URL corpus to S3 → offline job runs MIQPS + anomaly detection against previous version (asymmetric rules: non-neutral → neutral flips are anomalies; new non-neutral entries + pattern-disappearances are fine; reject publish if >A% of entries flip dangerous direction) → publish to config store + archive to S3 → runtime URL Normalizer loads map at init and does in-memory lookup using OR semantics across four layers (static allowlist + regex + MIQPS + conservative default — parameter kept if any layer votes keep). Offline-over-realtime rationale: render cost is seconds-per-page; realtime analysis scales with URL count (billions) while offline scales with domain count (hundreds of thousands); transient rendering failures are retryable offline but would block content processing in realtime; URL conventions change on weeks-to-months cadence making staleness acceptable. Asymmetric-cost reasoning threads every design choice: dropping a non-neutral parameter silently merges distinct items (catastrophic); keeping a neutral parameter wastes a render (tolerable). Early-exit stops testing once non-neutral is clear; conservative-default marks under-sampled parameters non-neutral; anomaly-detection rules treat only the dangerous flip direction as anomalous; multi-layer keeps if any layer preserves. No numerical wins disclosed (no dedup ratio, no compute savings, no latency delta); all five tunables K/S/T/N/A left abstract. First canonical URL-normalisation / content-deduplication post on the wiki — new canonical wiki instances of patterns/per-domain-adaptive-config-learning, patterns/visual-fingerprint-based-parameter-classification, patterns/multi-layer-normalization-strategy, patterns/conservative-anomaly-gated-config-update, patterns/offline-compute-online-lookup-config + concepts concepts/url-normalization, concepts/content-id-fingerprint, concepts/query-parameter-pattern, concepts/neutral-vs-non-neutral-parameter, concepts/canonical-url-unreliability, concepts/anomaly-gated-config-update, concepts/offline-compute-online-lookup. Complements request-level-deduplication post on a different axis of the deduplication-umbrella — that post is recsys-serving-compute dedup, this post is content-ingestion-compute dedup.
2026-03-19 — Building an MCP Ecosystem at Pinterest. Pinterest Agent Foundations (Tan Wang) publishes a one-year retrospective on Pinterest's MCP ecosystem: 66,000 invocations/month across 844 MAUs, saving an estimated ~7,000 engineer-hours/month as of January 2025. Six opinionated architectural choices: (1) hosted over local — paved path is cloud-deployed, not stdio-on-laptop, so central routing + security apply; (2) many small domain-specific servers (Presto, Spark, Airflow, Knowledge) over one monolith — per-server access control + context-window hygiene; (3) unified deployment pipeline so authoring is business-logic-only; (4) central MCP registry as source of truth for approved-for-production servers (dual Web UI + AI-client-API surfaces, pre-flight authorization); (5) layered JWT + SPIFFE mesh auth — Envoy validates JWT + maps to X-Forwarded-User / X-Forwarded-Groups headers for coarse-grained policy, in-process @authorize_tool(policy='…') decorator (patterns/per-tool-authorization-decorator) for fine-grained per-tool policy, SPIFFE mesh identity for service-only flows; (6) elicitation-gated HITL via MCP's elicitation primitive for mutating/expensive actions, with batch approval as HITL-cost-reduction. Three seed servers named: Presto MCP (highest-traffic, business-group gated to Ads/Finance/infra), Spark MCP (AI Spark debugging, channel-scoped to Airflow support channels), Knowledge MCP (general Q&A substrate). Three integration surfaces: internal LLM web chat, AI bots on internal chat platform (per-channel tool visibility), IDE plugins. Explicit rejection of the MCP OAuth spec's per-server consent flow for internal traffic — "users already authenticate against our internal auth stack when they open a surface like the AI chat interface, so we piggyback on that existing session." Canonical wiki instance of patterns/hosted-mcp-ecosystem + first canonical enterprise-SSO piggyback MCP shape.
2026-04-13 — Scaling Recommendation Systems with Request-Level Deduplication. Pinterest Engineering (Matt Lawhon, Filip Ryzner, Kousik Rajesh, Chen Yang, Saurabh Vishwas Joshi) retrospective framing request-level deduplication as a cross-cutting discipline that absorbed the 100× transformer dense parameter + 10× model dimension scaleup of the Pinterest Foundation Model (ACM RecSys 2025 oral spotlight) without proportional infrastructure growth. Canonical thesis: "the same fundamental redundancy exists at every layer" — user sequences (~16K tokens, powering the Foundation Model + TransAct) are identical across every candidate scored in a request, yet without explicit dedup get stored, loaded, trained on, and served once per item. Three-stage framework: storage ((user_id, request_id) sort on Iceberg → 10–50× columnar compression on user-heavy feature columns; also unlocks bucket joins + efficient backfills + incremental feature engineering + stratified user-level sampling); training — request-sorted batches break the IID assumption producing two distinct failure modes (1–2% offline-metric regression from BatchNorm-statistics fluctuation; ~0% → ~30% in-batch false-negative rate in retrieval), fixed with SyncBatchNorm (one-line fix for ranking, "communication overhead negligible compared to training-throughput gains") + user-level masking (x_k ≠ x_i extension to InfoNCE's identity mask for retrieval); plus deferred re-duplication at GPU as the shared data-loader discipline that keeps request-level data deduplicated through preprocessing + feature transforms and only expands on GPU / in-model. Serving — two-tower retrieval is deduplicable by construction; ranking gets DCAT (Deduplicated Cross-Attention Transformer) with custom Triton kernels — user sequence context-pass runs once per deduplicated request with KV cached per layer, each candidate cross-attends to cached KV. Pattern cached-KV cross-attention is the ranking-side analogue of the two-tower factorisation, and structurally identical to the LLM-inference KV cache primitive (shared-prefix KV populated once, per-item queries cross-attend). Production numbers (US 2025, Pinterest internal, citation "²"): 10–50× storage compression; 4× retrieval training speedup; ~2.8× ranking training speedup (40% from deduplicated data loading × 2× from DCAT); 7× ranking serving throughput — "what made it possible to deploy a 100× larger model without proportional serving cost increases, absorbing the full Foundation Model scaleup while holding infrastructure budgets in check." Three load-bearing lessons stated verbatim: (1) "Request-level deduplication is a cross-cutting technique" (same redundancy at every layer); (2) "Simple fixes unlock big wins" (SyncBatchNorm + user-level masking are minimal code changes with outsized impact — hardest part was identifying the problems); (3) "Impact compounds across the stack" (storage feeds data pipelines, training feeds experimentation velocity, serving feeds next-model capacity). Complements the 2026-03-03 ads engagement model post — that post canonicalised request-level user-embedding broadcasting as the serving-only narrow instance; this post generalises dedup to the full lifecycle.
2026-04-08 — Performance for Everyone. Pinterest Android Performance Engineering (Lin Wang) retrospective on retrofitting automatic User Perceived Latency measurement across every Android surface by building Visually Complete detection into the UI base class (BaseSurface). Three opt-in marker interfaces — PerfImageView / PerfTextView / PerfVideoView — let product engineers tag content-critical views; the base class walks the view tree from the root, filters to visible Perf* instances via geometry, conjoins per-view readiness, and emits a timestamp automatically. Canonical data points: two engineer-weeks per surface hand-rolled cost → 60+ Android surfaces continuously measured with zero per-surface work; "all surfaces measured by the same standard" means fair cross-surface comparison for the first time; short-shelf-life surfaces (Christmas landing pages) previously excluded are now automatically covered. The pattern generalises: "following the success on Android, we have also extended the same concept to iOS and web platforms." Thesis: "Once the performance metrics are offered to product engineers for free, it makes Pinterest's performance more visible and encourages everyone to protect and optimize the User Perceived Latency on their surfaces." Canonical wiki instances of patterns/base-class-automatic-instrumentation, patterns/view-tree-walk-for-readiness-detection, patterns/opt-in-performance-interface. First client-side performance-platform post on the Pinterest wiki axis — complements existing server-side observability / quota / ML ranking axes with the "measurement platform" slice.
2026-04-07 — Evolution of Multi-Objective Optimization at Pinterest Home Feed. Pinterest Homefeed + Content Quality teams retrospective on three generations of Home Feed's multi-objective optimization (MOO) / blending layer — the final funnel stage after retrieval / pre-ranking / ranking that determines feed composition rather than per-candidate engagement. V1 (2021) used DPP with GraphSage + categorical-taxonomy pairwise similarity inside a backend node chain. V2 (early 2025) replaced DPP with Sliding Spectrum Decomposition (SSD) hosted in PyTorch on Pinterest's company-wide model serving cluster — lower serving latency, numerically robust (no PSD enforcement / Cholesky failures), expandable similarity substrate (visual + text + graph + Q3-2025 PinCLIP multimodal + Q4-2025 Semantic ID prefix overlap). V2+ (mid/late 2025) added a unified soft-spacing penalty composed into SSD's utility equation for content-quality-risk classes, later abstracted into a config-based framework. Canonical production datum: removing DPP produced a >2% time-spent-impression drop within week 1, with day-1 engagement gains reversing by week 2 — canonical short-term-vs-long-term engagement trade-off and closed-loop feedback evidence. Sits alongside the ads engagement model post (upstream ranking) and L1 CVR diagnosis post (diagnosis methodology) to complete the three-axis wiki coverage of Pinterest's recommendation funnel.
2026-03-06 — Unified Context-Intent Embeddings for Scalable Text-to-SQL. Pinterest data platform team (Keqiang Li, Bin Yang) document how the Pinterest Analytics Agent evolved from a schema-grounded RAG-based Text-to-SQL prototype into the #1 agent at Pinterest (10× the next most-used, 40% analyst-population coverage in two months, target 50% year-end). Two central engineering claims: (1) unified context-intent embeddings — index natural-language descriptions of the business question each historical SQL query was designed to answer, not table docs; the SQL-to-text step generates explicit "analytical questions this query answers" creating a question-to-question bridge that sidesteps vocabulary mismatch between user phrasing and schema phrasing. (2) Structural + statistical patterns with governance-aware ranking — extract validated join keys + filters + aggregation patterns from query history, fuse with tier + freshness + documentation + ownership signals when ranking. Post also documents the full supporting stack: PinCat (internal catalog on DataHub) as system of record for tiers + glossary terms; AI Table Documentation + join-graph + search-based glossary propagation (~70% manual documentation work reduction, >40% columns auto-tagged); internal Vector DB as a Service on AWS OpenSearch + Hive + Airflow (zero-to-production-index in days, millions of embeddings, daily incremental updates, hybrid semantic-plus-metadata filtering); four-layer Analytics Agent architecture (Orchestration + MCP + Context + Execution) with EXPLAIN-before-EXECUTE validation + bounded retry + default LIMIT 100 + column-profiling-aware filter generation. Governance roadmap: 400K → ~100K table footprint reduction; "Governance and AI reinforce each other." Thesis: "your analysts already wrote the perfect prompt" — query history is the knowledge base, self-reinforcing as 2,500+ analysts continuously teach the system.
2026-03-03 — Unifying Ads Engagement Modeling Across Pinterest Surfaces. Pinterest Ads ML (Duna Zhan, Qifei Shen, Matt Meng, Jiacheng Li, Hongda Shen) consolidate three surface-specific CTR-prediction models (Home Feed, Search, Related Pins) into a unified engagement model with shared trunk (MMoE + long-user-sequence Transformer) + surface-specific tower trees + view-type-specific calibration + multi-task heads + surface-specific checkpoint exports. Serving efficiency paired with unification: DCNv2 projection layer, fused-kernel embedding, TF32, request-level user-embedding broadcasting. Staged unification by CUDA throughput — HF + SR first (similar cost), RP deferred until efficiency work stabilised. Load-bearing claim: MMoE + long sequences only paid off when integrated into unified model with multi-surface training data.
2026-02-27 — Bridging the Gap: Diagnosing Online-Offline Discrepancy in Pinterest's L1 Conversion Models. Pinterest Ads ML production-retrospective on why a new L1 CVR model showed 20–45% offline LogMAE reduction but neutral / negative CPA online. Introduces three-layer diagnosis framework, rules out exposure bias / timeouts / offline-eval bugs, names feature parity gap + embedding version skew as the two concrete layer-2 causes, + funnel recall ceilings as layer-3 residual.
2026-02-24 — Piqama: Pinterest Quota Management Ecosystem. Pinterest Big Data Processing Platform + Online Systems jointly introduce Piqama, a generic quota management platform handling both capacity quotas (memory / vcore / concurrent-apps for Moka on Yunikorn) and rate-limit quotas (QPS / bandwidth for TiDB + KV Stores). Architecture: REST + Thrift control-plane portal; pluggable schema + validation (including remote-service hooks for cluster-capacity sum-checks) + authorization + dispatch + enforcement; pre-aggregated usage stats to Apache Iceberg on S3; separate auto-rightsizing service reading from Iceberg / Presto / user-defined sources.
2024-05-14 — HBase Deprecation at Pinterest (Part 1). Pinterest Storage + Data Infrastructure: part 1 of a 3-part retrospective announcing the 2021 decision to deprecate HBase across Pinterest's entire production footprint, after running one of the largest HBase deployments in the world (peak ~50 clusters / ~9,000 EC2 instances / >6 PB of data). Five-reason deprecation framework: maintenance cost, missing functionality, system complexity, infra cost, waning community. Workload-axis migrations already in flight: OLAP → Druid + StarRocks; time-series → Goku; KV → KVStore on RocksDB + Rocksplicator. Remaining slot drove TiDB selection.

Architectural themes¶

Pinterest's wiki corpus currently spans five axes — storage (HBase deprecation, TiDB selection), quota governance (Piqama + Moka + PinConf + SPF), ads ML production debugging (online- offline discrepancy), ads model unification (one unified engagement model with surface-specific specialisation), and production LLM analytics (Analytics Agent + Text-to-SQL on top of PinCat governance + unified context-intent embeddings + internal Vector DB platform). A common thesis connects them: fragmentation is expensive; deliberate consolidation pays off when paired with efficiency work. The HBase deprecation retrospective frames this at the storage-substrate layer (too many bolt-on services on one datastore → workload-specific migration → NewSQL consolidation for the remainder); the ads engagement unification frames it at the ML- model layer (three surface-specific models → one unified model with surface-specific tower trees + calibration + checkpoints); the Analytics Agent frames it at the LLM-infrastructure layer (every team reinventing vector indexes + table search + ad-hoc RAG → one Vector DB platform + one governance catalog + one shared intent index).

Three orthogonal operational levers appear across these posts: workload / surface specialisation (where generalisation fails, specialise narrowly via tower trees or workload-specific stores), per-segment refinement (surface-specific calibration is the ads analogue of workload-specific stores), async/decoupled control plane (Piqama's async rule distribution, MediaFM-style decoupled training/serving boundaries, HBase's standby cluster for offline workflows), and — new with the Analytics Agent post — governance as AI infrastructure (tier tags + glossary terms + lineage are not documentation hygiene, they are load-bearing inputs to the ranker and the SQL validator).

Pinterest¶

Key systems¶

User-sequence platform — configuration-as-code + shared execution engine + lambda architecture + columnar time-partitioned storage (2026-05-21 user-sequence platform post)¶

Contextual Sequential Two-Tower CG — context layer + hybrid offline/online inference (2026-05-08 contextual-CG post)¶

Feature Trimmer + ML serving root-leaf architecture — network-bandwidth optimisation on the online serving path (2026-05-01 Feature Trimmer post)¶

Shopping conversion candidate generation — two-tower retrieval with parallel DCN+MLP, advertiser-level loss (2026-04-27 conversion-CG post)¶

PinCompute + Ray ML platform — ENA-reset zombie memcg incident (2026-04-15 CPU bottlenecks post)¶

Request-level deduplication — Foundation Model + DCAT + Iceberg sort-order (2026-04-13 dedup post)¶

URL normalisation — MIQPS + long-tail parameter learning (2026-04-20 MIQPS post)¶

MCP ecosystem — hosted MCP servers + central registry + layered JWT/mesh auth (2026-03-19 MCP ecosystem post)¶

Client-side performance measurement — Android BaseSurface + PerfView interfaces (2026-04-08 Performance for Everyone post)¶

Home Feed multi-objective optimization (2026-04-07 MOO-evolution post)¶

Analytics Agent + PinCat + Vector DB (2026-03-06 Text-to-SQL post)¶

Ads engagement modeling (2026-03-03 unified model post)¶

Quota management platform (2026-02-24 Piqama post)¶

Storage substrate (2024-05-14 HBase deprecation post)¶

Key patterns / concepts¶

URL normalisation + content deduplication (2026-04-20 MIQPS post)¶

Hosted MCP ecosystem + layered auth (2026-03-19 MCP ecosystem post)¶

Shopping conversion candidate generation (2026-04-27 conversion-CG post)¶

Client-side performance platform (2026-04-08 Performance for Everyone post)¶

Home Feed multi-objective optimization (2026-04-07 MOO-evolution post)¶

Text-to-SQL / Analytics Agent (2026-03-06 unified context-intent embeddings post)¶

Ads ranking / model unification (2026-03-03 post)¶

Quota management (2026-02-24 Piqama post)¶

Storage deprecation (2024-05-14 HBase post)¶

Recent articles¶

Architectural themes¶

Client-side performance measurement — Android `BaseSurface` + `PerfView` interfaces (2026-04-08 Performance for Everyone post)¶