CONCEPT Cited by 2 sources
Feature store¶
Definition¶
A feature store is the class of ML-infrastructure systems that manage and deliver feature data — the numerical/categorical signals a model consumes at both training time and inference time — as a first-class shared substrate across teams.
Concretely a feature store typically provides:
- Feature definitions — a declarative spec for each feature (name, type, source, transformation, freshness SLO).
- Offline store — historical values for training (often cheap object storage + a table format).
- Online store — low-latency lookup at inference time (often a KV store like DynamoDB, Redis, or a DynamoDB-compatible internal store like Dynovault).
- Ingestion — batch + streaming + direct-write pipes from source systems into both stores.
- Serving API — runtime interface model-serving code calls to fetch features for a request.
Why feature stores exist as a separate thing¶
Models don't ship one feature at a time. A ranker typically wants dozens of features per candidate × hundreds of candidates per query — thousands of lookups. Having every ML team build their own ingestion, store, and serving gets expensive and inconsistent fast. A shared feature store:
- Decouples feature engineering from model serving — ML engineers write transformations; serving infrastructure is abstracted away.
- Keeps training and serving on the same feature values — the canonical failure mode without a feature store is training/serving skew (a feature computed one way at training time and a slightly different way at serving time). Shared definitions + offline/online stores that share a lineage fix this; adjacent to concepts/training-serving-boundary but not the same axis — this is feature-data consistency, not compute- fleet unification.
- Amortizes ingestion across many models — one pipeline feeds the features; N models consume.
Landscape¶
Named in the Dropbox post as evaluated options:
- Feast — open-source.
- Hopsworks — open-source with managed offering.
- Featureform — open-source.
- Feathr — open-source (LinkedIn origin).
- Databricks Feature Store — platform-native.
- Tecton — commercial.
Dropbox chose Feast for its clean definitions/infra separation and adapter ecosystem, then layered their own serving tier (Go + Dynovault) behind it — a common pattern for large orgs whose on-prem or bespoke infrastructure doesn't match any vendor's shape. (Source: sources/2025-12-18-dropbox-feature-store-powering-real-time-ai-dash)
Load-bearing design axes¶
When evaluating or building a feature store, the knobs that dominate:
- Latency budget — sub-100ms is typical for ranking; sub-10ms is aggressive but occasionally needed for search autocomplete.
- Fan-out shape — per-query how many feature lookups? Per-user? Per-candidate? Amplifies all other axes.
- Freshness requirement — some features can be days stale (content embeddings); some must be seconds-fresh (recent-interaction signals).
- Ingestion shape — does data arrive in bulk (batch), as a stream (CDC, event), or directly written by another model (precomputed scores)? See patterns/hybrid-batch-streaming-ingestion.
- Training/serving consistency — same feature values available in both paths, same semantics.
Related¶
- systems/dash-feature-store — the Dropbox realization.
- systems/feast — open-source reference implementation.
- systems/dynovault — one online-store choice (DynamoDB- compatible).
- concepts/feature-freshness — one dimension of the evaluation.
- concepts/training-serving-boundary — adjacent ML-infra concept; feature-store concerns overlap but differ.
- concepts/compute-storage-separation — architectural precursor: same discipline applied to analytics.
Seen in¶
- sources/2025-12-18-dropbox-feature-store-powering-real-time-ai-dash — canonical in-wiki introduction; Dropbox's Dash feature-store stack as the worked example, including landscape citation and in-house hybrid-build rationale.
- sources/2026-01-06-expedia-powering-vector-embedding-capabilities — the feature-store online/offline duality transplanted to an embedding-platform context: online store = vector DB for interactive similarity search, offline store = historical dataset repository for analytics / experimentation / training / backup, with an explicit restore path from offline → online gated on creation date / time range / SQL. Expedia layers systems/feast on top not to register feature views but embedding collections — evidence that the feature-store design pattern (definitions registry + online/offline split + orchestrated ingestion) is a shape, not a schema, and carries over cleanly to vectors.