CONCEPT Cited by 1 source
Diff sketch¶
A diff sketch is a machine-readable summary of a code change — what classes / methods / fields / calls were introduced, removed, or modified — produced by running a code indexer on a diff. The term originates in Meta's Glean (Source: sources/2025-01-01-meta-indexing-code-at-scale-with-glean).
Definition¶
From the Meta post:
"Glean indexing runs on diffs (think, 'pull requests') to extract a mechanical summary of the changeset that we call a diff sketch. For example, a diff might introduce a new class, remove a method, add a field to a type, introduce a new call to a function, and so on. The diff sketch lists all of these changes in a machine-readable form."
A diff sketch is semantic, not textual:
- Textual diff: lines added/removed.
- Diff sketch: entities added/removed (classes, methods, fields), edges added/removed (calls, inheritance, type references), each identified by symbol handles from the code index.
What downstream consumers do with it¶
The post names four primary use cases:
- Static analysis on changes. "Diff sketches are used to drive a simple static analysis that can identify potential issues that might require further review." — analysis that would be too expensive to run repo-wide becomes cheap to run per-diff.
- Non-trivial lint rules. Rules that depend on semantic context ("this diff adds a call to a deprecated API" / "this diff removes the last call site of function X") — rules a text-level linter can't express.
- Rich notifications. Teams can subscribe to semantic events ("notify me when anyone modifies function Y") rather than path globs.
- Semantic search over commits. "One example of the latter is connecting a production stack trace to recent commits that modified the affected function(s), to help root-cause performance issues or new failures." — the post's headline application.
Code-review nav as a sketch consumer¶
Distinct from the four above, diff sketches also power review-time code navigation:
"Indexing diffs also powers code navigation in our code review tools, giving code reviewers access to accurate go-to-definition on the code changes being reviewed, along with other code insights such as type-on-hover and documentation. This is a powerful lift to the code review process, making it easier for reviewers to understand the changes and provide valuable review feedback."
Meta's code-review tool (Phabricator) surfaces Glean-fed navigation on the diff, for C++, Python, PHP, JavaScript, Rust, Erlang, Thrift, Haskell.
Sketch vs raw index queries¶
A raw code-index query (e.g. "all references to foo()") answers
against the repository at a revision. A diff sketch is scoped to
what a single changeset touched — cheaper to compute, easier to
reason about, and the natural unit for review-time tooling. It's a
pre-aggregation over the index specifically for the
changeset-grained consumers.
Why name it a "sketch"¶
"Sketch" captures that it's lossy by design:
- Includes what the changeset did to the code graph — entities, edges, signatures.
- Excludes body-level detail that doesn't move the semantic graph (local variable renames, comment edits, whitespace). For review-time nav you want these; for analysis you usually don't. Different consumers can hit different projections of the underlying diff index.
Seen in¶
- sources/2025-01-01-meta-indexing-code-at-scale-with-glean — the term "diff sketch" and every claim on this page originates here.
Related¶
- systems/glean — the code-indexing substrate producing sketches.
- systems/phabricator — code-review consumer of sketches at Meta.
- patterns/diff-based-static-analysis — the pattern that generalises the downstream-consumer list above.
- concepts/code-indexing · concepts/symbol-id