Skip to content

CONCEPT Cited by 1 source

Diff sketch

A diff sketch is a machine-readable summary of a code change — what classes / methods / fields / calls were introduced, removed, or modified — produced by running a code indexer on a diff. The term originates in Meta's Glean (Source: sources/2025-01-01-meta-indexing-code-at-scale-with-glean).

Definition

From the Meta post:

"Glean indexing runs on diffs (think, 'pull requests') to extract a mechanical summary of the changeset that we call a diff sketch. For example, a diff might introduce a new class, remove a method, add a field to a type, introduce a new call to a function, and so on. The diff sketch lists all of these changes in a machine-readable form."

A diff sketch is semantic, not textual:

  • Textual diff: lines added/removed.
  • Diff sketch: entities added/removed (classes, methods, fields), edges added/removed (calls, inheritance, type references), each identified by symbol handles from the code index.

What downstream consumers do with it

The post names four primary use cases:

  1. Static analysis on changes. "Diff sketches are used to drive a simple static analysis that can identify potential issues that might require further review." — analysis that would be too expensive to run repo-wide becomes cheap to run per-diff.
  2. Non-trivial lint rules. Rules that depend on semantic context ("this diff adds a call to a deprecated API" / "this diff removes the last call site of function X") — rules a text-level linter can't express.
  3. Rich notifications. Teams can subscribe to semantic events ("notify me when anyone modifies function Y") rather than path globs.
  4. Semantic search over commits. "One example of the latter is connecting a production stack trace to recent commits that modified the affected function(s), to help root-cause performance issues or new failures." — the post's headline application.

Code-review nav as a sketch consumer

Distinct from the four above, diff sketches also power review-time code navigation:

"Indexing diffs also powers code navigation in our code review tools, giving code reviewers access to accurate go-to-definition on the code changes being reviewed, along with other code insights such as type-on-hover and documentation. This is a powerful lift to the code review process, making it easier for reviewers to understand the changes and provide valuable review feedback."

Meta's code-review tool (Phabricator) surfaces Glean-fed navigation on the diff, for C++, Python, PHP, JavaScript, Rust, Erlang, Thrift, Haskell.

Sketch vs raw index queries

A raw code-index query (e.g. "all references to foo()") answers against the repository at a revision. A diff sketch is scoped to what a single changeset touched — cheaper to compute, easier to reason about, and the natural unit for review-time tooling. It's a pre-aggregation over the index specifically for the changeset-grained consumers.

Why name it a "sketch"

"Sketch" captures that it's lossy by design:

  • Includes what the changeset did to the code graph — entities, edges, signatures.
  • Excludes body-level detail that doesn't move the semantic graph (local variable renames, comment edits, whitespace). For review-time nav you want these; for analysis you usually don't. Different consumers can hit different projections of the underlying diff index.

Seen in

Last updated · 319 distilled / 1,201 read