Skip to content

CONCEPT Cited by 2 sources

Feature store

Definition

A feature store is the class of ML-infrastructure systems that manage and deliver feature data — the numerical/categorical signals a model consumes at both training time and inference time — as a first-class shared substrate across teams.

Concretely a feature store typically provides:

  • Feature definitions — a declarative spec for each feature (name, type, source, transformation, freshness SLO).
  • Offline store — historical values for training (often cheap object storage + a table format).
  • Online store — low-latency lookup at inference time (often a KV store like DynamoDB, Redis, or a DynamoDB-compatible internal store like Dynovault).
  • Ingestion — batch + streaming + direct-write pipes from source systems into both stores.
  • Serving API — runtime interface model-serving code calls to fetch features for a request.

Why feature stores exist as a separate thing

Models don't ship one feature at a time. A ranker typically wants dozens of features per candidate × hundreds of candidates per query — thousands of lookups. Having every ML team build their own ingestion, store, and serving gets expensive and inconsistent fast. A shared feature store:

  1. Decouples feature engineering from model serving — ML engineers write transformations; serving infrastructure is abstracted away.
  2. Keeps training and serving on the same feature values — the canonical failure mode without a feature store is training/serving skew (a feature computed one way at training time and a slightly different way at serving time). Shared definitions + offline/online stores that share a lineage fix this; adjacent to concepts/training-serving-boundary but not the same axis — this is feature-data consistency, not compute- fleet unification.
  3. Amortizes ingestion across many models — one pipeline feeds the features; N models consume.

Landscape

Named in the Dropbox post as evaluated options:

  • Feast — open-source.
  • Hopsworks — open-source with managed offering.
  • Featureform — open-source.
  • Feathr — open-source (LinkedIn origin).
  • Databricks Feature Store — platform-native.
  • Tecton — commercial.

Dropbox chose Feast for its clean definitions/infra separation and adapter ecosystem, then layered their own serving tier (Go + Dynovault) behind it — a common pattern for large orgs whose on-prem or bespoke infrastructure doesn't match any vendor's shape. (Source: sources/2025-12-18-dropbox-feature-store-powering-real-time-ai-dash)

Load-bearing design axes

When evaluating or building a feature store, the knobs that dominate:

  • Latency budget — sub-100ms is typical for ranking; sub-10ms is aggressive but occasionally needed for search autocomplete.
  • Fan-out shape — per-query how many feature lookups? Per-user? Per-candidate? Amplifies all other axes.
  • Freshness requirement — some features can be days stale (content embeddings); some must be seconds-fresh (recent-interaction signals).
  • Ingestion shape — does data arrive in bulk (batch), as a stream (CDC, event), or directly written by another model (precomputed scores)? See patterns/hybrid-batch-streaming-ingestion.
  • Training/serving consistency — same feature values available in both paths, same semantics.

Seen in

  • sources/2025-12-18-dropbox-feature-store-powering-real-time-ai-dash — canonical in-wiki introduction; Dropbox's Dash feature-store stack as the worked example, including landscape citation and in-house hybrid-build rationale.
  • sources/2026-01-06-expedia-powering-vector-embedding-capabilities — the feature-store online/offline duality transplanted to an embedding-platform context: online store = vector DB for interactive similarity search, offline store = historical dataset repository for analytics / experimentation / training / backup, with an explicit restore path from offline → online gated on creation date / time range / SQL. Expedia layers systems/feast on top not to register feature views but embedding collections — evidence that the feature-store design pattern (definitions registry + online/offline split + orchestrated ingestion) is a shape, not a schema, and carries over cleanly to vectors.
Last updated · 200 distilled / 1,178 read