Metric Semantic Layer: How Lyft Governs and Scales Key Data Definitions¶
Summary¶
Lyft built an internal Metric Semantic Layer (MSL) — a centralized Python package that serves as the single, authoritative repository for every "Golden Metric" definition in the company. Each metric is codified as a YAML configuration with Jinja-templated SQL, exposed through Python methods and APIs. The system solves the problem of metric definition drift that emerges when teams independently define the same metrics with subtly different SQL, leading to inconsistent decisions. MSL enforces consistency through three pillars: (1) simplified onboarding via config-driven definitions that propagate changes automatically to all downstream consumers, (2) intentional governance via a dual-ownership model (Business Owner + Operational Owner) with required mutual approval for changes, and (3) transparency through API access, data catalog integration with Amundsen, self-service UI, and MCP (Model Context Protocol) integration for AI agents.
Key takeaways¶
-
Metric definition drift is the core problem: As Lyft scaled, different teams used different SQL definitions for the same metric, leading to inconsistent decision-making. Without centralized version control, outdated definitions crept into production dashboards and ML models.
-
Implementation as a versioned Python package: MSL is a Python package (not a service), distributed and versioned with automated code refactors pushing updates to all dependent applications. This gives deterministic, reproducible SQL generation without runtime dependencies.
-
YAML + Jinja for DRY metric definitions: Each metric is a YAML file declaring metadata (owners, data sources, dimensions, granularities) with SQL logic stored in Jinja templates. Jinja was chosen over other templating frameworks for its lower learning curve and ability to avoid redundant SQL declarations across granularities (DRY methodology).
-
"Golden Metrics" selection criteria gates onboarding: Not all metrics enter MSL — only "Golden Metrics" with at least two distinct use cases or applications qualify. This prevents the package from becoming a dumping ground of niche team-specific metrics.
-
Dual-ownership model enforces accountability: Every Golden Metric requires both a Business Owner (data analyst/scientist — responsible for metric health, definitions, limitations) and an Operational Owner (data engineer — responsible for data health, ETL pipelines, DQ checks, backfills). Both must approve changes, providing informed coordination and quality control.
-
Team-based ownership for resilience: Owners must be teams, never individuals — deliberately designed to survive org changes, team rotations, and attrition.
-
Multi-channel access pattern: The Python API is the core, but MSL is also surfaced through Amundsen (discoverability/search), a self-service Metric UI (no-code SQL generation), and an MCP server for AI agents (natural-language metric queries with reduced hallucination).
-
AI-native by architecture: Because metric definitions are stored as clean, structured YAML, they serve as a knowledge base for AI agents with "greater accuracy and fewer hallucinations." Guardrails baked in using ground-truth evaluation and LLM-as-a-judge techniques.
-
Automated downstream propagation: When a metric definition changes, the updated package version is deployed to all dependent applications through automated code refactors — consumers don't need to take manual action.
-
Evolving toward vendor-managed solutions: Lyft acknowledges MSL as a "great foundation" but is exploring vendor-managed options for broader third-party BI/database integration and AI-driven analysis.
Systems / concepts / patterns extracted¶
Systems¶
- systems/lyft-metric-semantic-layer — the MSL Python package itself (new)
- systems/amundsen — integrated for metric discoverability (existing)
Concepts¶
- concepts/headless-bi-semantic-layer — MSL is a canonical real-world instance (existing)
- concepts/golden-metric-selection-criteria — the ≥2 use-case threshold for onboarding (new)
- concepts/dual-owner-metric-governance — Business Owner + Operational Owner model (new)
- concepts/metric-definition-as-code — treating metric SQL as versioned, package-distributed code (new)
- concepts/metric-definition-drift — the anti-pattern MSL solves (new)
Patterns¶
- patterns/yaml-config-driven-metric-definitions — YAML + Jinja templates for DRY SQL generation (new)
- patterns/jinja-templated-sql-generation — parameterized SQL via Jinja for time granularity + dimensions (new)
- patterns/dual-owner-approval-for-metric-changes — mandatory dual sign-off as governance gate (new)
- patterns/metric-semantic-layer-as-ai-knowledge-base — structured YAML definitions as grounding context for AI agents/MCP (new)
Operational numbers¶
- YAML config per metric includes:
operational_owner,data_sources,view(SQL template),time_attribute,time_granularity(day/week/month),dimensions, andmetricdefinitions with type annotations (Additive, etc.) - Python package is versioned with periodic automated deployment to all consumers
- MCP integration uses ground truth + LLM-as-a-judge evaluation guardrails
Caveats¶
- The article does not disclose scale numbers (how many Golden Metrics exist, how many downstream consumers, latency of SQL generation).
- MSL is a Python package, not a service — which means it requires periodic re-deployment to consumers (push model), not real-time resolution (pull model). The article frames this as a feature (deterministic) but doesn't discuss the propagation delay.
- The article acknowledges MSL is a "homegrown solution" and Lyft is actively evaluating vendor-managed alternatives for broader integration, suggesting the approach may not scale to all BI tool integrations.
Source¶
- Original: https://eng.lyft.com/metric-semantic-layer-how-lyft-governs-and-scales-key-data-definitions-56bee3643c29?source=rss----25cd379abb8---4
- Raw markdown:
raw/lyft/2026-06-10-metric-semantic-layer-how-lyft-governs-and-scales-key-data-d-3c5b3076.md
Related¶
- sources/2026-01-06-lyft-feature-store-architecture-optimization-and-evolution — previous Lyft data platform article; feature store uses Amundsen for discoverability
- concepts/headless-bi-semantic-layer — the general concept MSL instantiates
- concepts/semantic-layer-of-business-concepts — Zalando's MDM framing of the same idea
- concepts/data-governance-tiering — Pinterest's complementary governance approach
- systems/amundsen — Lyft's data catalog, integrated with MSL