Skip to content

PATTERN Cited by 1 source

Async queue for on-demand feature compute

Front the feature store with a generic async-capable compute substrate, not a dedicated microservice. Let requests enqueue the missing features; let a cache on the compute substrate hold the results. Avoids building + operating a bespoke on-demand feature- computation service for every feature-store use case.

Problem

An on-demand feature store (e.g. for media features, where precompute is infeasible) needs to:

  1. Receive feature requests from many callers.
  2. Compute the feature dependency graph.
  3. Schedule the actual compute asynchronously (compute is expensive and bursty).
  4. Cache results so repeated requests are cheap.
  5. Let callers fetch after a while.

Building a dedicated microservice for each of these gets operationally expensive fast — queuing infrastructure, compute pool, cache layer, scheduler, retries, observability.

Solution

Reuse a generic async-capable compute substrate — at Netflix, Metaflow Hosting — that already provides:

  • A request queue.
  • Elastic compute that scales based on queue depth (including scale-to-zero for GPU-backed compute).
  • A response cache.
  • Monitoring / alerting / tracing.

Then the feature store's logic shrinks to:

  1. Resolve the feature dependency graph.
  2. Send async requests to the compute substrate for missing features.
  3. Poll / fetch from the substrate's cache.

Netflix frames the trade-off explicitly:

"We could have built a dedicated microservice just for this use case, but thanks to the flexibility of Metaflow Hosting, we were able to ship the feature faster with no additional operational burden." (Source: sources/2024-07-22-netflix-supporting-diverse-ml-systems-at-netflix)

Canonical flow — Amber at Netflix

  service ── request ──▶  Amber (feature store)
  feature F                 │
                            │ 1. compute dependency graph for F
                            │ 2. send async request(s) for missing
                            │    features
                         Metaflow Hosting queue
                            │ 3. triggers when compute available
                         feature compute flow (Metaflow)
                            │ 4. result cached
                         Metaflow Hosting cache
                            │ 5. Amber fetches "after a while"
                         Amber ◀─────┘

From the post:

"When a service requests a feature from Amber, it computes the feature dependency graph and then sends one or more asynchronous requests to Metaflow Hosting, which places the requests in a queue, eventually triggering feature computations when compute resources become available. Metaflow Hosting caches the response, so Amber can fetch it after a while."

What the pattern pushes to the compute substrate

  • Queueing — back-pressure handling.
  • Scaling — including scale-to-zero for GPU workloads.
  • Caching — including TTL management.
  • Observability — request logs, tracing hooks.

The feature store keeps only the feature-specific logic: the dependency graph, the calling convention, and the "fetch after a while" contract with callers.

Trade-offs

  • Coupling to the substrate — if Metaflow Hosting's semantics change, the feature store adapts. That's fine as long as the substrate is a supported platform.
  • Latency — async semantics mean the first caller for a new feature pays full compute latency; hot features are lookup- speed after warm-up.
  • Cache consistency — the substrate's cache is the feature cache. Invalidation on upstream model changes is the feature store's problem.

When not to use it

  • If the feature-compute substrate is not flexible enough to serve features — e.g. it can't cache, or can't scale to zero.
  • If features truly do need real-time eval with sub-100ms p99 — dedicated synchronous serving is a better fit.

Seen in

Last updated · 550 distilled / 1,221 read