CONCEPT Cited by 1 source
Stacked immutable databases¶
Stacked immutable databases is a mechanism for representing changes on top of a base database without destructively modifying the base. Each layer in the stack is itself immutable; layers can non-destructively add information to, or hide information from, the layers below. From the consuming client's perspective, the whole stack behaves as a single database.
This is the representation substrate Meta uses in Glean to support incremental indexing at monorepo scale while keeping multiple revisions queryable simultaneously.
The problem it solves¶
Meta's constraint:
"We don't want to destructively modify the original data, because we would like to be able to provide data at multiple revisions of the repository, and to do that without storing multiple full-sized copies of the data. So we would like to store the changes in such a way that we can view the whole index at both revisions simultaneously." (Source: sources/2025-01-01-meta-indexing-code-at-scale-with-glean.)
Three requirements in one sentence:
- Preserve prior revisions — queries at old revisions must still work.
- No full-size duplication — per-revision storage must be delta- sized, not snapshot-sized.
- Single-view semantics — clients see one logical database per revision, not a manual union.
Mechanism¶
From the post: "Glean solves the first problem with an ingenious method of stacking immutable databases on top of each other. A stack of databases behaves just like a single database from the client's perspective, but each layer in the stack can non-destructively add information to, or hide information from, the layers below."
Two operations per layer:
- Add — layer N introduces new facts that weren't in layers N-1…0. Semantically: the observable fact-set at revision N is fact-set at N-1 ∪ new facts at N.
- Hide — layer N suppresses facts from layers below (e.g. a function that was deleted in this revision). The mechanism is non-destructive: the underlying layer is untouched; the hide is a per-layer mask applied at query time.
Querying the stack: a query evaluated against the top layer sees the merged observable view of every layer, top-down, with hides pruning the result set.
Properties¶
- Immutability per layer — once written, a layer doesn't mutate. Layers can be shared between revision graphs (if revision B branches from A, B's stack reuses A's layers as its base).
- Revision-delta storage — storage cost is proportional to changes, not number of revisions.
- Cheap branching — a new branch is a new layer on top of the shared ancestor; no snapshot copy.
- Multi-revision concurrent queries — different client sessions can query different revisions by choosing different top layers.
Trade-offs¶
- Query cost grows with stack depth. Each query must consult every active layer. Ways to bound this:
- Compact / squash older layers periodically.
- Index-level summaries so top-layer queries can skip empty layers.
- Bloom-filter-style existence hints per layer. The post doesn't disclose which of these Glean uses.
- Hide semantics must be sound. A careless hide at layer N can leak a below-layer fact if the predicate derivation doesn't route through the same hide gate. Schema-level derivation must be hide-aware.
- Storage fits the LSM-tree grain. The mechanism maps naturally onto RocksDB-style immutable SSTable organisation — each stacked layer is itself one or more SSTables. RocksDB's compaction is not identical to database-stack compaction (RocksDB compacts to reduce read amplification within one logical DB; stack compaction merges across logical revisions), but the alignment lets Glean reuse RocksDB primitives.
Relation to other layered designs¶
Stacked immutable databases generalise a pattern that recurs across storage systems:
- LSM trees (concepts/lsm-tree) — immutable SSTables at multiple levels; newer writes shadow older values at the same key. Glean's stacking applies the same idea at the revision layer, not the write layer.
- Overlay filesystems (e.g. OverlayFS) — writable upper layer + read-only lower layers, with whiteouts for delete.
- Git (systems/git) — each commit is an immutable delta on top of its parent tree; queries at a revision walk from the leaf commit back through the ancestry. Glean's revision graph conceptually mirrors the VCS it's indexing.
Caveats¶
- Full details are deferred. The Meta post stops at the paragraph quoted above, with a pointer to the Incremental indexing with Glean blog post. Not ingested here — deeper mechanisms (compaction strategy, hide-propagation through derived predicates, multi-revision query planner) are out of scope.
- No disclosed numbers. Per-layer storage sizes, compaction frequency, maximum stack depth — not in the post.
Seen in¶
- sources/2025-01-01-meta-indexing-code-at-scale-with-glean — canonical wiki reference for the mechanism and its role in Glean's incremental-indexing story.
Related¶
- systems/glean — canonical consumer.
- systems/rocksdb — storage substrate aligned with per-layer immutability.
- concepts/incremental-indexing · concepts/code-indexing · concepts/monorepo