Pinterest — Making User-Sequence Data More Cost-Efficient, Faster, and Easier to Use¶
Summary¶
Pinterest's Ads Feature Engineering Infra, Core ML Infra, ML Data, and User Understanding teams (Ajay Venkatakrishnan, Le Zhang, Eric Shang, Pihui Wei, Connor Votroubek, Yi He, Camilo Munoz, Simin Li) document a platform-level redesign of the user-sequence data substrate that powers Pinterest's ranking, retrieval, and recommendation models — including the ~16K-token user-action sequences fed into the Pinterest Foundation Model and TransAct. A user sequence here is "an ordered list of recent, relevant events for a user, along with the enrichments (signals) attached to each event" — embeddings, contextual features, derived attributes — assembled into a stable, well-defined representation. The redesign rests on four design decisions plus a migration discipline: (1) configuration-as-code for sequence + enrichment definitions in Python with a portable JSON compile target stored in object storage; (2) shared execution engine with pluggable executors — framework owns IO / concurrency / retries / backpressure / config validation, executors own per-event-type filtering + featurisation + raw-to-normalised mapping; (3) lambda architecture — streaming path serves the "now" view for online inference, batch path "fixes history" with corrections + backfills + long sequences for training and analysis, with the two paths cooperating rather than competing; (4) columnar, time-partitioned storage with table semantics — replacing pre-redesign "large, consolidated 'enriched event' blobs" where "every online call or offline scan had to pull the whole payload — even if a model only needed a small subset of features". Migration is event-type-by-event-type shadow cutover with two-tier comparison (event-level field-by-field + sequence-level shadow-vs-legacy) plus A/B experiments before consumer cutover, then iterating across event types until legacy is deprecated. Outcomes (qualitative, per company-policy): "significant infrastructure cost reductions" on storage / network / replication; onboarding new enrichments and event types "dropped substantially" (mostly config + small executor changes); "improved engagement metrics" on major recommendation surfaces post-migration.
Key takeaways¶
-
A user sequence is a multi-step pipeline output, not "the N latest events from a log table". "A high-quality sequence is not just 'the N latest events from a log table.' It is the result of a multi-step process: Ingest events from diverse sources, Filter down to the subset of events that matter, Enrich each event with additional signals (embeddings, metadata, and so on), and Finally assemble those enriched events into a stable, well-defined sequence representation." This four-stage decomposition — ingest → filter → enrich → assemble — is the structural pattern the rest of the platform is organised around. Each stage is owned by a distinct framework concern (sources, predicates, enrichment services, schema). The framing reframes user event sequences from a derived data product ("compute the last N rows for this user") into a first-class platform primitive with versioned schemas, configurable filtering, configurable enrichments, and assembly semantics.
-
Sequences sit underneath three distinct workload types and quality is multi-dimensional. "From an infrastructure point of view, they show up in three main places. In training datasets, offline pipelines pull long history windows of enriched events per user in order to build sequence features. In offline analysis, data scientists dissect user behavior across sessions, surfaces, or campaigns using sequence-level queries. And in online inference, real-time services fetch up-to-date user sequences at request time to feed ranking and retrieval models." Across these workloads, sequence quality is "multi-dimensional" — freshness (how quickly new events / enrichments show up), completeness (whether late-arriving events / corrections / backfills are eventually reflected — see late-arriving data), consistent enrichment (same enrichments available across streaming + batch, training + serving see aligned data — the online-offline guard), and stable schemas (versioned + predictable, not silently changed). One more constraint: "this is a multi-tenant platform. It has to support many teams and models, each with different needs and lifecycles. That makes correctness, observability, and operability just as important as raw throughput or latency."
-
The organising principle is "one definition, many runtimes". "Define a signal or event type once, then instantiate it consistently across multiple runtimes. A signal definition captures which raw events to use, which enrichments to apply, and how to assemble enriched events into a sequence. That same definition is then consumed by three different kinds of workloads: Real-time indexing for low-latency updates. Batch indexing and backfill for historical data and corrections. Online serving for fetching sequences at inference time." The explicit motivation is the split-brain failure mode: "This 'one definition, many runtimes' approach avoids the classic split-brain failure mode where training pipelines build sequences one way from batch tables while serving systems assemble sequences a different way from online stores. Over time, those two views naturally drift apart in subtle ways. Instead, we rely on a single configuration surface plus a shared execution engine to keep indexing, training and serving aligned." This is the structural fix for online-offline discrepancy at the data-substrate layer rather than the model layer — instead of debugging features-not-matching after a launch, the platform makes definition-divergence architecturally impossible.
-
Configuration-as-code for sequences + enrichments — Python configs compile to portable JSON in object storage. "We moved sequence and enrichment definitions into configuration-as-code, expressed in a regular programming language (Python) with a well-defined schema. Our configurations describe which sequence features exist, how they're named, and basic metadata such as owners, retention, and lifecycle stage. Event-type configuration describes, for each event type, which enrichments apply, what filtering logic to use, and what data sources to read from. Enrichment configuration explains how to fetch or derive additional signals (for example, embeddings) and how to map them into the event schema. These configurations are validated, compiled into a portable JSON format, stored in managed internal object storage, and then consumed by the shared execution engine across streaming, batch and serving jobs." The named benefits: "New event types or enrichments can now be added primarily through configuration, plus small, isolated pieces of code where absolutely necessary, instead of via entirely new pipelines. That significantly reduces the concept-to-production time for new signals." Plus the standard configuration-as-code wins: "Diffs are human-readable, code owners can review changes, rollbacks are straightforward, and version history lives in standard version control systems." Plus a clear separation of concerns: "ML and product teams focus on what they want (events, features, and filters) while platform teams focus on how to execute that configuration reliably and efficiently." Canonical configuration-as-code feature pipeline instance.
-
Shared execution engine + pluggable executors — framework / plugin separation. "We introduced a shared execution engine responsible for reading configuration, connecting to data sources (kafka, logs, tables, feature stores), running filtering and featurization, calling enrichment services or joining against offline tables, and finally writing enriched results to storage. Within this engine, an executor is a plugin that converts a raw event into one or more enriched records. In plain terms, the executor is the 'business logic module' for a particular event type or grouping, while the execution engine handles everything around it." The framework / plugin contract is explicit: "Framework responsibilities include wiring data sources and sinks, handling concurrency, retries, and backpressure, and parsing and validating configuration. Executors own the business-specific filtering and featurization logic and the mapping from raw events to normalized user-event representations." The shared engine is what makes one-definition-many-runtimes mechanically realisable: "the same core enrichment logic in both streaming jobs that handle near-real-time events and batch jobs that process historical data. That minimized code duplication and reduced drift between batch and real-time behavior." Canonical shared execution engine + pluggable executors instance.
-
Lambda architecture — streaming "now", batch "fixes history" — they cooperate. "Sequence consumers want two things that naturally pull in opposite directions. On one hand, they need freshness: 'I want this morning's actions reflected in ranking now.' On the other hand, they care about completeness and correctness: 'If late events show up tomorrow, I still want my sequences and training data to be right.' Real-world data is messy. Events arrive late. Enrichment sources are recomputed or corrected. Backfills introduce new historical coverage months after the fact." Pinterest's resolution: "A streaming path processes events as they arrive and maintains a near-real-time view of user sequences for online inference. A batch path periodically recomputes enriched events and sequences from raw historical data, producing long sequences and reusable datasets for backfills and offline analysis. The two paths cooperate instead of competing. The streaming path maintains the 'now' view of the world, while the batch path focuses on 'fixing history' and ensuring that training and long-term analytics see consistent, corrected data." The cooperation framing is load-bearing: classical lambda architecture critiques (Kreps, "Questioning the Lambda Architecture") targeted dual code-paths and dual reconciliation logic; Pinterest's design shares the executor logic across both paths via the shared execution engine, so the cost of maintaining two pipelines collapses to maintaining two scheduling shapes of the same logic. Canonical lambda architecture for fresh and complete sequences instance.
-
Columnar, time-partitioned storage replaces consolidated "enriched event" blobs. "Before this redesign, we stored sequences as large, consolidated 'enriched event' blobs. Every online call or offline scan had to pull the whole payload — even if a model only needed a small subset of features — so request fan-out turned directly into heavy payload size and I/O on our storage systems. We moved sequence storage to a columnar, time-partitioned layout that behaves like a set of tables. Each enrichment or feature lives in its own column, and reads can select only the columns they need for a given model or analysis. Data is partitioned by time bucket so that writes and scans stay constrained to relevant partitions as history grows. Engineers can query these datasets with familiar table abstractions, which makes it easy to compare runs, versions, or backfill strategies by inspecting partitions." Two distinct wins, efficiency ("columnar storage improves compression and reduces network bandwidth by avoiding wide 'enriched event' blobs when only a few features are needed. Time partitioning keeps I/O bounded even as the system accumulates long histories") and operability ("clear table semantics makes it much easier to inspect anomalous days or event types, validate new enrichments, and compare old and new pipelines side by side"). The structural insight: by moving enrichments into columns rather than into a fat blob, adding a new enrichment becomes a column-add rather than a payload-shape change — a cheap migration that compounds over time. Canonical columnar time-partitioned feature storage instance, sibling to columnar storage format applied to per-event ML feature substrates.
-
Event-by-event-type shadow cutover with two-tier comparison + A/B before consumer switch. "For a given event type, we first ran the new pipeline in parallel with the existing one and generated 'shadow' sequences. We then compared those shadow outputs to the legacy sequences over a defined period." The comparison is two-tier: "a strategy of using two tiers of comparisons, an event-level comparison, which compared field-by-field of events we matched between our old and new indexing jobs, as well as a sequence-level comparison, comparing the shadow sequence output with the legacy sequence output." Pinterest is explicit that 100% match is not the goal — "we are regenerating the data using completely new jobs, we had to accept that the data won't have a 100% match due to the nature of our online systems. As a result, we had to have thorough validations to prove that our new system was producing approximately the same sequences when compared to the legacy system." Cutover only happens after shadow validation + A/B: "Alongside performing A/B experiments using our new data, these validations gave us the confidence that we could safely swap our pipelines with no impact. Once we were confident in the behavior, we performed a controlled cutover by shifting consumers to read from the new architecture. We then iterated the same process across additional event types, steadily deprecating the legacy path." Canonical event-type-by-event-type shadow cutover instance — extends the wiki's existing parallel run / shadow migration / side-by-side runtime validation family with a per-event-type granularity discipline that lets a multi-tenant platform migrate incrementally.
-
Operational readiness as a first-class workstream. "We invested heavily in observability and operational hygiene. We set up dashboards tracking sequence freshness and lag, event and enrichment coverage, schema drift and configuration rollout status, and serving latency and error rates. These foundations turned out to be crucial. A platform that many teams rely on will eventually have bad days; the difference between a minor blip and a major incident often comes down to whether you can quickly see what went wrong and where." Each dashboard track maps to one of the four sequence-quality dimensions from takeaway #2: freshness/lag → freshness; coverage → completeness; schema drift / config rollout → stable schemas; serving latency + error → tenant-facing serving SLOs. Observability is not an afterthought added once incidents happen; it is the same shape as the quality contract the platform promises consumers, instrumented end-to-end.
-
Migration outcomes (qualitative-only by company policy). "On cost, we saw significant infrastructure cost reductions once large event types were fully migrated, primarily because of more efficient storage formats, fewer replicas where appropriate, and lower network transfer per request. On productivity, the time to onboard new enrichments and event types dropped substantially. Most changes moved from bespoke pipeline work to configuration updates and small, composable executors. On quality, our major recommendation surfaces saw improved engagement metrics after switching to sequences produced by the new platform, while still staying within quality and safety expectations." Three axes of return: cost (storage / replication / network), productivity (onboarding velocity), quality (engagement metrics on serving surfaces). The quality axis is the structural payoff of "one definition, many runtimes": better train-serving alignment translates into better online performance.
-
Future work surfaces three transferable directions. "We want richer self-serve tooling so that adding new signals feels more like filling out a template than editing infrastructure code. That includes wizards for new signals, static analysis for configurations, and automated backfill orchestration for common patterns. We are also interested in stronger correctness guarantees. Anomaly detection over both indexing and serving paths would further harden the system. Finally, we plan to broaden coverage and add richer signals. That includes extending sequence coverage to more event types and surfaces and adding higher-level behavioral abstractions on top of raw event sequences, such as session-level or object-level views. The challenge is to do that while preserving the core 'events → enriched signals → sequences' contract that keeps the platform coherent." The platform's coherence claim — preserving "events → enriched signals → sequences" as the user-facing contract regardless of how rich the internal abstractions get — is the architectural-leverage argument for why platform investments compound.
Architecture¶
High-level shape¶
┌─────────────────────────────────────┐
│ Configuration-as-code (Python) │
│ sequence defs + event-type defs │
│ + enrichment defs │
└──────────────┬──────────────────────┘
│ validate + compile
▼
┌─────────────────────────────────────┐
│ Portable JSON config │
│ in managed object storage │
└──────────────┬──────────────────────┘
│
┌─────────────────────────┼─────────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Streaming │ │ Batch │ │ Online │
│ ingestion │ │ingestion │ │ serving │
│ (Kafka) │ │ (DW logs)│ │ API │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
▼ ▼ ▼
┌────────────────────────────────────────────────────────────┐
│ Shared execution engine + pluggable executors │
│ • framework: IO, concurrency, retries, backpressure │
│ • executor: filter + featurise + map raw→normalised event │
│ • enrichment: embeddings, metadata, derived attrs │
└────────────────────┬───────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────────┐
│ Columnar, time-partitioned storage (table semantics) │
│ one column per enrichment/feature; partitioned by │
│ time bucket; selective column scan; long-history-safe │
└────────────────────┬───────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────────┐
│ Online serving API │
│ "Request sequence X for user U" — well-defined schema, │
│ documented freshness + completeness profile │
└────────────────────────────────────────────────────────────┘
Lambda paths cooperating¶
Streaming path Batch path
───────────── ──────────
Kafka Data warehouse / log archives
│ │
▼ ▼
Real-time indexer Batch indexer + backfill
(executor logic, same (executor logic, same
as batch via engine) as streaming via engine)
│ │
▼ ▼
"Now" view of sequences Long-history view +
(online inference reusable intermediate
freshness) datasets, fixes late
│ arrivals & corrections
▼ │
Online serving API ▼
Training datasets +
offline analysis +
historical coverage
The two paths share executor logic but have different scheduling shapes: streaming for freshness, batch for "fixing history" (late events / corrections / backfills). The streaming path's reads are powered by the latest column writes on the time-bucket; the batch path periodically rewrites the same time-bucket columns once corrections settle.
Migration shape (per event type)¶
Event type E:
1. Run new pipeline in parallel → "shadow sequences"
│
▼
2. Two-tier comparison:
a. event-level field-by-field on matched events
b. sequence-level shadow-vs-legacy
│
▼
3. A/B experiments with new-data sequences
│
▼
4. Controlled cutover: shift consumers to new arch
│
▼
5. Iterate to next event type; deprecate legacy path
The discipline is per-event-type, not per-pipeline. A multi-tenant platform with many event types migrates by walking the per-event-type list and converging each one independently — bounding blast radius and letting riskier event types soak longer in shadow.
Operational numbers¶
Pinterest discloses qualitative outcomes only in this post (per company policy):
- Cost: "significant infrastructure cost reductions once large event types were fully migrated". Three named drivers: "more efficient storage formats, fewer replicas where appropriate, and lower network transfer per request."
- Productivity: onboarding new enrichments + event types "dropped substantially". Most changes are now "configuration updates and small, composable executors" rather than "bespoke pipeline work".
- Quality: "major recommendation surfaces saw improved engagement metrics after switching to sequences produced by the new platform, while still staying within quality and safety expectations."
The user-sequence substrate this post documents is the same ~16K-token user-sequence input consumed by Pinterest Foundation Model and TransAct (per the 2026-04-13 request-level deduplication post); the platform-side investments documented here are upstream of the model-side scaling work documented there.
Goals (verbatim, paraphrased for the wiki)¶
Goals:
- "Provide a consistent 'events → enriched signals → sequences' contract." Stable, well-defined interface independent of the underlying runtime.
- "Improve cost-efficiency at scale." Reduce storage and network use while keeping latency and reliability appropriate for online use.
- "Make onboarding new event types and enrichments faster and safer." Configuration + small executor changes, not new bespoke pipelines.
- "Support both real-time and batch production paths." Low-latency updates for serving + batch backfills for historical coverage and corrections, with a clear policy for how the two paths merge.
Non-goals:
- Not redesigning downstream models or ranking architectures — "the focus is on the platform that feeds them."
- Not changing the product definition of events — "those semantics remain owned by product and logging teams."
Caveats¶
- No quantitative numbers anywhere. Cost reduction percentages, onboarding-time-before/after, engagement-metric deltas are all withheld per company policy. The wiki entry's takeaways therefore have a directional/qualitative character that other Pinterest posts (the 2026-04-13 dedup post, the 2026-05-01 Feature Trimmer post) make quantitative.
- No architecture diagrams reproduced. Pinterest published two figures (system architecture, storage layout) in the original post; the markdown ingest does not include them. The shapes shown above are reconstructed from the prose.
- No latency / throughput SLOs disclosed. Online serving API freshness target, batch backfill cadence, time-partition granularity, columnar format choice (Parquet? ORC? proprietary?), partition lifecycle management — none specified.
- Specific tooling not named. "Streaming" is implied to be Kafka-fed (one mention), but the streaming compute substrate (Flink? Spark Structured Streaming? in-house?) is undisclosed. Batch compute substrate undisclosed. Object-storage substrate undisclosed. Serving-layer substrate undisclosed (likely a feature-store flavour given context).
- Schema versioning policy undocumented. "Stable schemas" is one of the four quality dimensions, and "schema drift" is one of the dashboarded signals, but the actual versioning + deprecation contract for sequence schemas isn't specified.
- Multi-tenancy implementation undisclosed. Tenant boundaries, isolation guarantees, per-tenant SLOs, fair-share / quota mechanisms (cf. Piqama) — all unspecified.
- Lambda-merge semantics not specified. "With a clear policy for how the two paths merge" is named as a goal but the actual reconciliation rule (last-write-wins on time bucket? batch-overrides-streaming after T elapsed? per-column reconciliation?) is not described.
- Shadow tolerance bands not quantified. "Approximately the same sequences" is the validation bar; the post doesn't disclose how event-level mismatches were thresholded or how sequence-level mismatches were aggregated and approved.
- Migration scope undisclosed. Number of event types migrated, percentage of legacy retired, in-flight tenants — all unspecified.
- Self-serve tooling deferred to future work. The wizards / config static-analysis / automated-backfill-orchestration are named as not yet shipped; today the platform is config-driven but not template-driven.
- No security or governance content. PII handling, access control, audit, retention policy on sequence storage — unaddressed in this post.
Source¶
- Original: https://medium.com/pinterest-engineering/making-user-sequence-data-more-cost-efficient-faster-and-easier-to-use-2a56a928cae1?source=rss----4c5a5f6279b6---4
- Raw markdown:
raw/pinterest/2026-05-21-making-user-sequence-data-more-cost-efficient-faster-and-eas-4db89d92.md
Related¶
- companies/pinterest
- systems/pinterest-user-sequence-platform — the platform this post documents.
- systems/pinterest-foundation-model — primary downstream consumer of the ~16K-token user-sequence substrate.
- systems/pinterest-transact — sequential user-action Transformer model; second primary consumer.
- systems/pinterest-related-pins, systems/pinterest-search, systems/pinterest-home-feed — recommendation surfaces that benefit from improved sequence quality.
- systems/transformer — the architectural class that consumes user sequences as input.
- concepts/user-event-sequence — the data primitive: ingest → filter → enrich → assemble.
- concepts/one-definition-many-runtimes — the organising principle.
- concepts/sequence-quality-dimensions — freshness / completeness / consistent enrichment / stable schemas, the multi-dimensional quality contract.
- concepts/enrichment-execution-engine — the engine + pluggable-executor architectural concept.
- concepts/online-offline-discrepancy — the failure mode this platform structurally prevents at the data-substrate layer.
- concepts/late-arriving-data — the problem the batch path is designed to absorb.
- concepts/columnar-storage-format — substrate for the new storage layer.
- concepts/feature-store — sibling concept for online-feature-fetch substrates.
- patterns/configuration-as-code-feature-pipeline — Python configs → portable JSON → object storage → consumed by all runtimes.
- patterns/shared-execution-engine-pluggable-executors — framework owns IO + concurrency + retries; executors own per-event-type business logic.
- patterns/lambda-architecture-for-fresh-and-complete-sequences — streaming "now" + batch "fixes history", cooperating via shared executor logic.
- patterns/columnar-time-partitioned-feature-storage — column-per-enrichment + time-bucket partitioning + table semantics.
- patterns/event-type-by-event-type-shadow-cutover — per-event-type incremental migration with two-tier comparison + A/B.
- sources/2026-04-13-pinterest-scaling-recommendation-systems-with-request-level-deduplication — model-side scaling work that consumes the user-sequence substrate this post describes.
- sources/2026-05-08-pinterest-enhancing-ad-relevance-integrating-real-time-context-into-sequential-recommender-models — sibling Pinterest post on the model side; the contextual sequential CG consumes user sequences.