PATTERN Cited by 1 source
Async queue for on-demand feature compute¶
Front the feature store with a generic async-capable compute substrate, not a dedicated microservice. Let requests enqueue the missing features; let a cache on the compute substrate hold the results. Avoids building + operating a bespoke on-demand feature- computation service for every feature-store use case.
Problem¶
An on-demand feature store (e.g. for media features, where precompute is infeasible) needs to:
- Receive feature requests from many callers.
- Compute the feature dependency graph.
- Schedule the actual compute asynchronously (compute is expensive and bursty).
- Cache results so repeated requests are cheap.
- Let callers fetch after a while.
Building a dedicated microservice for each of these gets operationally expensive fast — queuing infrastructure, compute pool, cache layer, scheduler, retries, observability.
Solution¶
Reuse a generic async-capable compute substrate — at Netflix, Metaflow Hosting — that already provides:
- A request queue.
- Elastic compute that scales based on queue depth (including scale-to-zero for GPU-backed compute).
- A response cache.
- Monitoring / alerting / tracing.
Then the feature store's logic shrinks to:
- Resolve the feature dependency graph.
- Send async requests to the compute substrate for missing features.
- Poll / fetch from the substrate's cache.
Netflix frames the trade-off explicitly:
"We could have built a dedicated microservice just for this use case, but thanks to the flexibility of Metaflow Hosting, we were able to ship the feature faster with no additional operational burden." (Source: sources/2024-07-22-netflix-supporting-diverse-ml-systems-at-netflix)
Canonical flow — Amber at Netflix¶
service ── request ──▶ Amber (feature store)
feature F │
│ 1. compute dependency graph for F
│ 2. send async request(s) for missing
│ features
▼
Metaflow Hosting queue
│
│ 3. triggers when compute available
▼
feature compute flow (Metaflow)
│
│ 4. result cached
▼
Metaflow Hosting cache
▲
│ 5. Amber fetches "after a while"
│
Amber ◀─────┘
From the post:
"When a service requests a feature from Amber, it computes the feature dependency graph and then sends one or more asynchronous requests to Metaflow Hosting, which places the requests in a queue, eventually triggering feature computations when compute resources become available. Metaflow Hosting caches the response, so Amber can fetch it after a while."
What the pattern pushes to the compute substrate¶
- Queueing — back-pressure handling.
- Scaling — including scale-to-zero for GPU workloads.
- Caching — including TTL management.
- Observability — request logs, tracing hooks.
The feature store keeps only the feature-specific logic: the dependency graph, the calling convention, and the "fetch after a while" contract with callers.
Trade-offs¶
- Coupling to the substrate — if Metaflow Hosting's semantics change, the feature store adapts. That's fine as long as the substrate is a supported platform.
- Latency — async semantics mean the first caller for a new feature pays full compute latency; hot features are lookup- speed after warm-up.
- Cache consistency — the substrate's cache is the feature cache. Invalidation on upstream model changes is the feature store's problem.
When not to use it¶
- If the feature-compute substrate is not flexible enough to serve features — e.g. it can't cache, or can't scale to zero.
- If features truly do need real-time eval with sub-100ms p99 — dedicated synchronous serving is a better fit.
Seen in¶
- sources/2024-07-22-netflix-supporting-diverse-ml-systems-at-netflix — Netflix Amber + Metaflow Hosting (canonical).