SYSTEM Cited by 1 source
Glean¶
Glean is Meta's open-source code-indexing system: a centralized service that collects, derives, and queries structured facts about source code. Glean was open-sourced in August 2021 and is the substrate for Meta's code browsing, code search, auto-generated docs, code review, IDE acceleration, dead-code detection, API-migration tracking, test selection, automated data removal, and RAG in AI coding assistants (Source: sources/2025-01-01-meta-indexing-code-at-scale-with-glean).
Architecture¶
Four layers:
- Indexers — per-language collectors that walk source and emit facts conforming to that language's schema. Indexing is "heavily parallelized and we may have many indexing jobs running concurrently."
- Fact database — facts are stored in RocksDB, "providing good scalability and efficient retrieval." Databases are "replicated across the query service machines and also backed up centrally."
- Query service — "widely distributed to support load from many clients that are also distributed." Exposes Angle queries over the network.
- Schemas — one per language (plus arbitrary non-language schemas). Each schema defines predicates (≈ SQL tables) whose instances are facts (≈ SQL rows). Schemas also compose: the schema language supports "deriving information automatically, either on-the-fly at query time or ahead of time" — Glean's mechanism for defining language-neutral views over language-specific facts.
Core design bets¶
- Centralize indexing, network-query the result. The IDE-local model breaks down at monorepo scale; Glean is the canonical wiki instance of the centralized AOT indexing pattern.
- Don't decide the data model for users. "Glean doesn't decide for you what data you can store" — each language owns its schema; arbitrary non-language data is supported. Trade-off: a lowest-common-denominator model would have been faster to build but would not have enabled the dead-code / build-graph / data-removal / RAG use cases Glean accreted after launch.
- Declarative logic-based query language. Angle is general enough to
express schema-level derivation, cross-language views, and transitive
closures (e.g. C++
#includefanout is a Glean query). - Incremental indexing. Target O(changes), realistic floor O(fanout); implemented via stacked immutable databases.
What Glean differs from¶
Named alternative: LSIF (Language Server Index Format), the LSP-ecosystem format IDEs use to cache navigation data. Glean "wasn't tied either to particular programming languages or to any particular use case" — an explicit generality contrast vs LSIF's LSP-centric feature set.
Consumers at Meta¶
- Glass — Meta's symbol server built on top of Glean. Uniform code-navigation API; used by the internal code browser (embedded Monaco) and by Phabricator code review.
- C++ IDE augmentation. Meta's VS Code C++ extension serves go-to-definition / find-references / hovercards from Glean at IDE startup, before clangd finishes analysing the working set, then blends Glean and clangd as files load.
- Documentation generation. API structure + doc comments extracted into Glean → rendered client-side; every symbol gets a stable symbol ID so doc URLs survive code motion.
- Diff sketches. Glean indexes diffs to produce a diff sketch; downstream static analysis, lint rules, commit-level semantic search, and review-time go-to-definition all consume sketches. See patterns/diff-based-static-analysis.
- Ad-hoc + post-launch uses. Build-dependency graph analysis, dead-code detection, API-migration progress, code-complexity metrics, test coverage + test selection, automated data removal, RAG in AI coding assistants.
Published performance¶
From the 2024-12-19 post (illustrative, not load-tested):
| Query shape | Latency |
|---|---|
| FunctionDeclaration by name + namespace | "about a millisecond" |
| Inheritance-chain + overriding-method lookup | "first results in a few milliseconds", streamed incrementally |
No fleet, throughput, or index-size numbers are disclosed.
Query language taste (Angle)¶
Define predicates as type records; query by prefix of fields.
A query specifying name and namespace is prefix-indexed and fast
because the schema declares that field order. Angle also supports
more complex queries — e.g. "classes that inherit from exception and
override a method called what" — with incremental streaming of
results from the query server.
See systems/angle-query-language for the full treatment.
Open source¶
- Project site: glean.software
- GitHub: facebookincubator/Glean
- Glass source:
glean/glass
Seen in¶
- sources/2025-01-01-meta-indexing-code-at-scale-with-glean — the 2024-12-19 Meta Engineering architecture overview; the canonical reference for every claim on this page.
Related¶
- systems/angle-query-language · systems/glass-symbol-server · systems/rocksdb · systems/lsif
- concepts/code-indexing · concepts/symbol-id · concepts/incremental-indexing · concepts/stacked-immutable-databases · concepts/diff-sketch
- patterns/centralized-ahead-of-time-indexing · patterns/language-neutral-schema-abstraction · patterns/diff-based-static-analysis
- concepts/monorepo · companies/meta