CONCEPT Cited by 1 source
Temperature-zero for deterministic code generation¶
Definition¶
Temperature=0 is the LLM-sampling setting that makes the decoder deterministically pick the highest-probability token at every step (greedy decoding), collapsing the output distribution from stochastic to deterministic. The canonical lever for achieving reproducibility of LLM outputs in code-generation pipelines where the same input must produce the same output across runs.
Why it's load-bearing for code-migration pipelines¶
Code-migration tools have two properties that make reproducibility non-negotiable:
- Prompt-regression tests must be stable. The same input example must produce the same output — otherwise a CI test fails sometimes and not others, and prompt drift is undetectable against the noise floor.
- Debugging is intractable without reproduction. If a transformation fails on one run but succeeds on the next, engineers can't bisect the prompt to find the cause.
Zalando encountered both problems in the hackathon phase:
"Initially, we noticed varying outputs for the same input, making testing and validation challenging. Changing LLM settings, like setting the temperature parameter to 0 made the LLM's output to be more deterministic and reproducible." (Source: sources/2025-02-19-zalando-llm-powered-migration-of-ui-component-libraries)
What "deterministic" means in practice¶
- Same input + same model version + temperature=0 → same output for that request. Deterministic enough to pin a golden output in a regression test.
- Same input across time at temperature=0 can still vary because the model's backend may change. Zalando also describes "moody behaviour" — "LLM tools occasionally produced inconsistent outputs. These issues appeared without any clear reason, sometimes simply by rerunning the same prompt on the same file at a different time." This is provider-side non-determinism (batching, routing, MoE-expert selection, fp16/fp32 differences between GPU nodes) that temperature=0 cannot suppress. Determinism is best-effort, not absolute.
Tradeoffs¶
- Loses creativity on open-ended tasks. Greedy decoding can produce less diverse outputs for creative writing or brainstorming tasks. For code-migration — where correctness is binary and creativity is a liability — this is a feature, not a bug.
- Susceptible to degenerate loops. Greedy decoding sometimes gets stuck in repetition loops (the top-1 token at each step leads back to itself). In practice this is rare for well-instructed code-generation tasks but happens occasionally with buggy prompts.
- Doesn't prevent model updates from changing outputs.
When the provider rolls out a new checkpoint, the
golden outputs need to be regenerated. Lock the model
version (e.g.
gpt-4o-2024-08-06rather thangpt-4o) to defer this.
Interaction with prompt caching¶
Temperature=0 is independent of prompt caching — caching works on the prefix regardless of sampling settings, and determinism works on the output regardless of caching. But they're often paired in production: caching is the cost/latency lever, temperature=0 is the reproducibility lever; together they produce a cheap, fast, predictable LLM call.
Role in regression testing¶
With temperature=0, the same input → same output invariant makes prompt- regression tests tractable: CI runs the toolkit over the golden example library and diffs the output against a checked-in expected file. Without temperature=0, the diffs would flake and the tests would be ignored.
Seen in¶
- sources/2025-02-19-zalando-llm-powered-migration-of-ui-component-libraries — canonical wiki instance. Zalando set temperature=0 explicitly to fix reproducibility issues encountered during toolkit development.
Related¶
- concepts/prompt-cache-consistency — orthogonal cost/latency lever often paired with this
- concepts/llm-generated-prompt-regression-test — the CI discipline this setting enables
- patterns/llm-only-code-migration-pipeline — the pattern this setting sits inside
- systems/zalando-component-migration-toolkit — the production tool
- companies/zalando