Skip to content

NETFLIX

Read original ↗

Thinking Fast & Slow for a Personalized Notification System

Summary

Netflix describes a hierarchical "Slow/Fast" architecture for their personalized notification system that sends hundreds of millions of push, email, and in-app alerts. Inspired by Kahneman's dual-process theory, a Slow Policy makes strategic, personalized decisions about each member's weekly messaging plan (frequency and pacing per channel), while a Fast Policy handles real-time tactical decisions about which specific message to send at each opportunity. The two layers communicate asynchronously through a feature store, enabling independent evolution and consistent member experiences.

Key Takeaways

  1. Short-term reward horizons create blind spots: The previous single-policy system optimized for immediate post-notification engagement, missing cumulative effects like fatigue and opt-out risk that only surface over weeks. (Source: "Short-Term Reward Horizons" section)

  2. Coupled ranking + pacing prevents true personalization: When a single model decides both whether to send and what to send, per-member frequency becomes an implicit byproduct of a global relevance threshold rather than an explicit personalized control variable. Adjusting frequency also changes message quality, and vice versa. (Source: "Coupled Ranking and Pacing Decisions" section)

  3. Hierarchical decomposition solves the coupling problem: The Slow Policy defines a personalized pacing plan (frequency per channel over a week); the Fast Policy selects the optimal message within those constraints. This decouples frequency planning from content selection entirely. (Source: "The Proposed Method" section)

  4. Utility function with universal message cost: The Slow Policy maximizes U(member, action) = Σ wₖ·Rewardₖ − Cost(action), where positive signals capture engagement likelihood and negative signals capture fatigue/opt-out. A universal message cost term is added because empirical negative feedback is too sparse; without it, the policy degenerates to "always send." (Source: "The Utility Function" section)

  5. Discretized action space keeps optimization tractable: The Slow Policy's action space covers ~O(100) distinct combinations of push + email frequency, making it expressive enough to differentiate members while small enough for evaluation. (Source: "The Slow Policy" section)

  6. Feature store as asynchronous bridge: The Slow Policy writes pacing plans to a low-latency feature store; the Fast Policy reads them as features at send time. This decouples execution cadences — the Slow Policy runs once per defined period, while the Fast Policy executes on every notification opportunity. (Source: "Policy-to-Policy Communication" section)

  7. Uniform pacing as robust baseline: Frequency targets translate into per-opportunity send probabilities (weighted coin flips), producing organically randomized patterns matching the target send rate. The framework extends to non-uniform profiles (day-of-week, user-activity-conditioned bursts). (Source: "Pacing Strategy" section)

  8. Largest production metric lift to date: Gains were most significant among infrequent viewers ("casual viewers") — a critical cohort where timely, relevant notifications drive the most incremental awareness. (Source: "Key Results" section)

  9. Independent evolution of layers: The two-layer architecture allows A/B testing of pacing strategies and content-ranking models as independent, clean variables without cross-contamination. (Source: "Key Results" section)

Architectural Details

  • Scale: Hundreds of millions of personalized notifications per day across push, email, and in-app channels.
  • Slow Policy cadence: Runs at weekly granularity (configurable); writes strategic intent to feature store.
  • Fast Policy cadence: Executes on every notification send opportunity (real-time).
  • Action space: ~O(100) distinct cross-channel pacing strategies.
  • Communication: Asynchronous via low-latency feature store (no synchronous coupling between policies).
  • Previous system: Single causal model predicting single-message incrementality with a calibrated relevance threshold.

Concepts Extracted

Patterns Extracted

Systems Referenced

Source

Last updated · 546 distilled / 1,578 read