CONCEPT Cited by 1 source
Ranking via election¶
Definition¶
Ranking via election is a tournament-style prompt-structure pattern for applying an LLM ranker to a candidate set larger than any single prompt can hold. Instead of ranking all N candidates in one pass, the population is split into batches of size B; each batch is ranked in its own prompt; the top-K survivors from each batch are aggregated; the process repeats until only the desired number of candidates remain.
Canonical wiki reference¶
Meta's web-monorepo RCA system (2024-06; sources/2024-08-23-meta-leveraging-ai-for-efficient-incident-response) uses ranking-via-election with B=20, K=5:
"We structure prompts to contain a maximum of 20 changes at a time, asking the LLM to identify the top five changes. The output across the LLM requests are aggregated and the process is repeated until we have only five candidates left."
Given a retriever output of a "few hundred" candidates, the election collapses to 5 in O(log N) rounds.
Why it exists¶
Two constraints drive the design:
- Context-window budget. A Llama 2 (7B) prompt holding a few hundred code changes (each with diff + metadata) runs out of context. Meta's B=20 lets each prompt fit comfortably within the ranker's usable context.
- Reasoning-quality degradation with N. Even if the context fits, LLM ranking quality over large lists is lower than ranking over small ones (attention dilutes, position bias grows). Small-N prompts produce cleaner ordering.
The three round shapes¶
round 1: [20 cands] → top 5 (× k prompts in parallel)
[20 cands] → top 5
...
round 2: aggregate all top-5s (5k candidates)
[20 cands] → top 5 (× k/4 prompts)
...
round n: 5 candidates remain → return
Rounds are deliberately shallow — at B=20/K=5, each round cuts the population by 4×. For a starting population of 320, 3 rounds suffice (320 → 80 → 20 → 5).
Trade-offs vs alternatives¶
- vs pointwise scoring. Score each candidate independently; sort. Avoids position bias but loses cross-candidate reasoning ("X is a better fit than Y because their diffs overlap").
- vs pairwise preference. Ask the LLM "is A better than B?" over all pairs; aggregate via a tournament-style algorithm. Quadratic in N; high precision; very expensive.
- vs listwise in one prompt. Dump all N into one prompt and ask for top-K. Limited by context window + reasoning quality at large N.
- vs hierarchical / logprob-based ranking. Score via a dedicated logprob-producing SFT format (Meta does this alongside election). More uniform calibration; requires a second fine-tuning round.
Ranking-via-election sits at a sweet spot: cross-candidate reasoning preserved, linear rather than quadratic work, each prompt bounded in size.
Caveats¶
- Shuffle candidates between rounds. Position bias in LLMs makes top-of-list more likely to survive. Shuffling candidates across rounds mitigates without eliminating this.
- Recurse-floor effect. Candidates eliminated in round 1 never reappear; if a mistake happens early, it's permanent. Meta's retrieve-then-rank pipeline compensates with a retriever that already narrows to high-likelihood candidates before the election starts.
- Cost multiplier. The number of LLM calls scales with ⌈N/B⌉ × log(N/K). At B=20, K=5, N=320 that's 16+4+1 = 21 ranker calls per investigation. Higher than a single-prompt approach; much cheaper than pairwise.
- Tie-handling is underspecified. Meta does not disclose how ties are broken across prompts in the aggregation step, or how the logprob-ranked list integrates with the election output.
Generalisation¶
Ranking-via-election generalises beyond RCA to any problem where:
- The candidate population is larger than one prompt can hold.
- An LLM's cross-candidate reasoning is valuable (not replaceable by pointwise scoring).
- The output needs to be a ranked short-list, not a single answer.
Candidates include: code-review-comment prioritisation, bug-triage queues, test-flake clustering, log-anomaly triage, and legal/policy-review pipelines where deep document comparisons outperform pairwise similarity.
Seen in¶
- sources/2024-08-23-meta-leveraging-ai-for-efficient-incident-response — canonical RCA ranking-via-election with B=20, K=5.
Related¶
- concepts/llm-based-ranker — the architectural role this prompt structure realises.
- concepts/context-window-as-token-budget — the underlying constraint.
- concepts/heuristic-retrieval — the upstream stage whose output the election consumes.
- patterns/retrieve-then-rank-llm — the end-to-end pattern.
- systems/meta-rca-system — canonical instance.