CONCEPT Cited by 1 source
Feature discoverability¶
Definition¶
Feature discoverability is the property of a feature store (or adjacent ML data platform) that engineers and modelers can find an existing feature before creating a new one — usually through a search-first catalog UI indexed on feature name, owner, tag, lineage, data type, and natural-language description.
Without it, the default failure mode of a large ML org is duplicate feature definitions across teams, each with slightly different semantics and drift. The cost is triple: wasted engineering work, duplicated compute pipelines, and model-quality risk when two teams build rankers against features they believed were the same but weren't.
What makes discoverability work¶
- Rich feature metadata — ownership, urgency tier, transformation logic, value semantics, data type, versioning, lineage. The feature definition must declare what it is, not just compute a number.
- Automatic catalog population — the pipeline that produces the feature should also publish the metadata to the catalog. Human-maintained catalogs rot.
- Search UX — free-text search over feature names, descriptions, and tags; plus structured filters over owner, tier, and lineage.
- Accessible to both engineering personas — the catalog should be findable for ML modelers (who design features) and software engineers (who integrate them into services).
Seen in¶
- sources/2026-01-06-lyft-feature-store-architecture-optimization-and-evolution — Lyft's Feature Store outsources discoverability to Amundsen (Lyft's own open-source data-discovery platform). The auto-generated Airflow DAGs tag feature metadata in Amundsen as a pipeline side-effect. Rohan Varshney frames it plainly: "Once features are generating data, discoverability is the next crucial step. [...] This integration allows users to easily search for existing features, a critical step in preventing the duplication of efforts and reducing wasted engineering work."
Related¶
- concepts/feature-store
- concepts/feature-freshness
- systems/amundsen
- systems/lyft-feature-store
- patterns/config-driven-dag-generation — the mechanism that makes automatic catalog population work at Lyft: metadata tagging happens inside the generated DAG itself, not as a separate step.