PATTERN Cited by 1 source
Language-neutral schema abstraction¶
Pattern: store detailed, language-specific facts underneath, and define language-neutral views over them in the schema language itself, so that cross-cutting clients get a uniform query surface without forcing the underlying data to be lowest-common- denominator.
Canonical wiki instance: Meta's Glean + Angle (Source: sources/2025-01-01-meta-indexing-code-at-scale-with-glean).
Problem¶
When indexing multiple programming languages, two clients pull in opposite directions:
- Language-specific tools (C++ dead-include detection, Python decorator-aware tracing, Rust macro expansion) need full fidelity — the specific fields and cross-references their language's semantics require.
- Language-agnostic tools (code browser outlines, cross-language "give me all declarations in this file", cross-language doc rendering) need a uniform shape — they shouldn't have to learn every language's schema.
A naive design forces a choice:
- Union-everything schema — lose language-specific fidelity, kill the C++-specific analyses.
- Lowest-common-denominator schema — uniform but shallow, can't
express C++-specific dead
usingdetection.
Mechanism¶
Keep the language-specific schemas. Define the uniform view in the schema language as derived predicates — a view layer that projects over the underlying facts.
From Meta's post:
"Glean can provide this language-neutral view of the data by defining an abstraction layer in the schema itself — the mechanism is similar to SQL views if you're familiar with those. This means that we don't have to compromise between having detailed language-specific data or a lowest-common-denominator language- neutral view; we can have both." (Source: sources/2025-01-01-meta-indexing-code-at-scale-with-glean.)
Properties:
- Language-specific indexers populate native fact tables. Each language owns its schema.
- Derived predicates project/union/transform those facts into
cross-language views (e.g.
Declarationdefined as "a C++FunctionDeclaration, OR a PythonFunctionDef, OR a RustFn, …"). - Clients query the view. A code-browser outline query hits the uniform view without needing to dispatch on language.
- The original data is preserved. C++-specific analyses still query the C++ fact tables directly.
Two-layer consumer split¶
Glean's consumer structure maps cleanly onto the two layers:
- Glass queries the
language-neutral view layer for code-navigation operations
(
documentSymbols, Find References, Call Hierarchy). Consumers: code browser, code review, doc UI. - Specialised analyses (dead-
usingdetection, build-graph analysis) query the language-specific layer. Each analysis lives close to the language whose facts it touches.
Extension to non-code data¶
The pattern generalises past programming-language fact schemas: "Glean can store arbitrary non-programming-language data too." Examples on the wiki (named in the post, not deeply specified): build-dependency graphs, API-migration progress, test-coverage metadata. Each gets its own schema; derived-predicate views compose them with code-level facts.
Variants¶
- View-at-query-time — derivation evaluated per query. Cheap to maintain, pays at query latency. Works for light projections.
- View-materialised-ahead-of-time — derivation materialised into a derived-predicate table during indexing. More storage; amortised query cost. Glean supports both ("deriving information automatically, either on-the-fly at query time or ahead of time").
- Hybrid — materialise hot views; leave cold ones on-the-fly. The 2024-12-19 post doesn't disclose Glean's policy.
Trade-offs¶
- View drift. Add a new language → update the derived-predicate view to include its shape. Not free, but touches the schema layer only, not consumer code.
- Query planner complexity. Derivations mean more decisions for the query engine (is this view pre-materialised? can the join be pushed into language-specific layer?). Angle owns this complexity.
- View-shape ergonomics. The uniform view's schema is the intersection of what matters across languages, which can feel thin; specialised clients must drop to the language-specific layer anyway.
Why this matters architecturally¶
The pattern flips "do we build for generality or specialisation?" into "we build both, and the schema language makes it cheap." Meta's Glean post is the canonical statement: the generality decision is what let Glean extend from code-navigation to dead-code detection to RAG without rewrites.
Seen in¶
- sources/2025-01-01-meta-indexing-code-at-scale-with-glean — the canonical wiki reference for the pattern. Meta explicitly frames it as "similar to SQL views."