Skip to content

SYSTEM Cited by 1 source

DoLa (Decoding by Contrasting Layers)

DoLa (Decoding by Contrasting Layers, arXiv:2309.03883, code at voidism/DoLa) is a factuality-decoding method that improves LLM factual accuracy by contrasting the next-token distribution from a mature transformer layer against the distribution from an earlier layer, then decoding from the contrast rather than from the final layer alone. Before SLED, DoLa was the best-performing factuality-decoding method in the category (Source: sources/2025-09-17-google-sled-making-llms-more-accurate-by-using-all-of-their-layers).

The DoLa and SLED lineage share a structural premise: intermediate transformer layers contain signal that the final layer sometimes overrides in favour of training-data frequency bias. DoLa extracts that signal via pairwise contrast between one premature and one mature layer; SLED extracts it via weighted-average across all layers. The wiki entry here is scoped to the context surfaced by the 2025-09-17 Google Research SLED post, which treats DoLa as its principal comparator.

Role in the SLED comparison

The SLED blog post positions DoLa as the prior state of the art among competing decoding methods — the baseline SLED is measured against. Two headline numbers from the comparison (blog-sourced, paper-authoritative):

The composability claim the SLED post makes — "SLED can be flexibly integrated with other factuality decoding methods" — explicitly includes DoLa-style contrast; the two are not architecturally exclusive.

What the SLED source discloses about DoLa

The Google Research SLED post names DoLa once as the prior SOTA and links to the GitHub repo and arXiv paper, but doesn't reproduce DoLa's mechanism. Per the linked arXiv paper's abstract (referenced for context, not reconstructed as claims here), DoLa's rule is to amplify logit differences between a mature and a premature layer — no additional decoding-time machinery beyond the contrast. Concrete mechanism detail lives in the paper; this wiki page stops at what the SLED source confirms.

Relationship to other wiki primitives

Seen in

Source

Last updated · 200 distilled / 1,178 read