Skip to content

PATTERN Cited by 1 source

Hybrid lexical+vector interleaving (min-max normalized, exact-match boosted)

Definition

Hybrid lexical+vector interleaving is a specific score-fusion tactic for combining results from two independent search indexes (one lexical / BM25-or-fuzzy-match, one dense-vector / semantic) into a single ranked result list:

  1. Query both indexes in parallel.
  2. Collect top-K results from each, with their native scores (which are not directly comparable across the two indexes).
  3. Apply min-max normalization per index, remapping each score list to [0, 1].
  4. Boost exact lexical matches — give them a score bonus so they rank above semantic neighbours of equal raw relevance.
  5. Interleave results by updated score, de-duplicating on canonical item ID.

The defining feature: two independently-maintained indexes, per-index re-scoring, lexical-exact-match as a hard signal. This is distinct from the general hybrid retrieval idea (which just says "combine both signals"); this pattern pins down how the scores get combined.

(Source: sources/2026-04-21-figma-the-infrastructure-behind-ai-search-in-figma)

Intent

Three forces that push on hybrid search:

  1. Exact matches must win. A user searching "Mouse" who has a component literally named "Mouse" expects it in the top result. A pure-vector ranker can demote it below thematically-similar cursor icons if their embedding happens to cluster tighter. The pattern encodes "exact lexical match trumps semantic similarity of the same magnitude" as a direct product rule, not an emergent ranker property.
  2. Semantic neighbours still matter. The same user wants the cursor-adjacent icons too — just below "Mouse". So the pattern can't be "lexical first, vector fallback"; it has to combine, with lexical bonused.
  3. Cross-index scores aren't comparable. BM25 scores and vector cosine distances live on entirely different scales. You can't just sum or weighted-average the raw numbers and expect sensible results. Normalization is the pre-condition.

Mechanism

query_q = user_query
lexical_hits  = lexical_index.search(query_q, topK=K)
vector_hits   = vector_index.search(embed(query_q), topK=K)

def minmax(hits):
    s = [h.score for h in hits]
    lo, hi = min(s), max(s)
    for h in hits:
        h.norm_score = (h.score - lo) / max(hi - lo, eps)

minmax(lexical_hits)
minmax(vector_hits)

# exact-match boost applied to lexical-side results only
for h in lexical_hits:
    if is_exact_match(h.title, query_q):
        h.norm_score += EXACT_MATCH_BOOST  # small positive constant

combined = []
seen = set()
for h in sorted(lexical_hits + vector_hits, key=lambda x: -x.norm_score):
    if h.item_id in seen: continue
    seen.add(h.item_id)
    combined.append(h)

The actual interleaving/fusion step is the sort-by-norm-score over the union of the two lists, de-duplicating on item ID. Different implementations will diverge on:

  • What counts as "exact match" (full-string-equals? case- insensitive? tokenized-substring?).
  • Boost size (small positive constant? multiplier?).
  • Top-K on each side (same? weighted by expected precision?).

Figma AI Search instance

"Searches are performed simultaneously against both the lexical index and the new embeddings index. Since raw scores from the independent OpenSearch indexes are not directly comparable, the result sets are assigned new scores based on a min-max normalization, with exact lexical matches receiving a boost. Results are then interleaved according to their updated scores.

Now a query returns not just the lexical results, but also appropriate results based on a semantic understanding as well. For example, "mouse" returns not just an icon that is specifically titled "Mouse" but also cursor-adjacent icons."

Context: lexical search (fuzzy string matching over component names and descriptions) predated AI-powered search in Figma Assets search. The hybrid ranker let the team safely roll out the new semantic index without regressing the existing behaviour — the pattern is a migration-compatible retrieval shape, not just a ranking policy.

Why min-max over reciprocal rank fusion (RRF)?

RRF — score each item as sum(1 / (k + rank_in_list)) — is a widely- cited alternative. Trade-offs:

Min-max normalization RRF
What's preserved The relative magnitudes within a list Only the ranks, not the scores
Exact-match boost Easy — add to norm score Awkward — ranks don't carry a magnitude to boost
Sensitivity to tail Sensitive (outlier tops affect normalization) Robust
Simplicity Very simple Very simple
Tunable One boost constant One k parameter

Figma's choice of min-max + boost is explicit in the post. RRF would be a reasonable alternative for a product without the "exact lexical match must win" rule.

Compared to pure-vector retrieval with lexical fallback

Hybrid interleave (this pattern) Pure vector + lexical fallback
Exact match Always surfaces in top Only surfaces if vectors fail
Implementation Two indexes, fusion step One index, conditional re-query
Latency Two parallel queries + merge One query + possibly second
Migration safety Preserves lexical behaviour Degrades lexical behaviour

For a product rolling out AI search on top of an existing lexical search users already rely on, the hybrid pattern is the safer migration.

Comparison to Dropbox Dash

Dash also runs hybrid BM25 + vector retrieval (sources/2026-01-28-dropbox-knowledge-graphs-mcp-dspy-dash), with its own fusion / reranker layer on top (Dash uses a learned ranker over multiple ranking passes including BM25, vector, knowledge-graph edges, ACL, personalisation). The retrieval-pattern-shape match, but Dash's post is qualitative on fusion ("we found BM25 was very effective on its own with some relevant signals … hybrid retrieval"); Figma's post gives the specific min-max-normalization + exact-match- boost mechanism. Both are canonical instances from different angles.

When to apply

  • Existing production lexical search + new vector search being rolled out; preserving lexical behaviour is required.
  • Product where "exact match must win" is a hard rule (most naming / title / component-library retrieval).
  • Small-to-moderate corpus where maintaining two indexes is acceptable.
  • OpenSearch-shaped stack where both indexes live in the same cluster / service and parallel querying is cheap.

Don't apply when:

  • You have only one signal (purely semantic corpus, or purely lexical).
  • A learned fusion model (LambdaRank, learned-to-rank, deep ranker) is available and benched better — then the learned ranker replaces the min-max + boost fusion.
  • Latency budget can't absorb two parallel queries + merge.

Caveats

  • "Exact match" ambiguity. Fuzzy-lexical-index scores themselves are approximate; deciding which result is an "exact" match needs a separate check (title equality, ID equality) that can't just use the lexical score threshold.
  • De-duplication is load-bearing. The same item can surface from both indexes — the union must dedupe on canonical ID or the top-K gets crowded with twins.
  • Min-max is sensitive to outliers. A single high-score outlier compresses the rest of the normalised scores. Clipping or log-normalization are common defences.
  • Boost constant is a hyperparameter. Needs tuning against the eval set; too small and exact matches lose; too large and semantic neighbours get buried.
  • Queries that only one index serves well. A pure typo query (lexical strong, semantic weak) or a pure paraphrase query (semantic strong, lexical weak) can still be handled, because both lists contribute; but the normalization window per list means the weak list's "best" still scores well on its own list. Usually acceptable; occasionally surprising.

Seen in

Last updated · 200 distilled / 1,178 read