CONCEPT Cited by 1 source

Temperature-zero for deterministic code generation¶

Definition¶

Temperature=0 is the LLM-sampling setting that makes the decoder deterministically pick the highest-probability token at every step (greedy decoding), collapsing the output distribution from stochastic to deterministic. The canonical lever for achieving reproducibility of LLM outputs in code-generation pipelines where the same input must produce the same output across runs.

Why it's load-bearing for code-migration pipelines¶

Code-migration tools have two properties that make reproducibility non-negotiable:

Prompt-regression tests must be stable. The same input example must produce the same output — otherwise a CI test fails sometimes and not others, and prompt drift is undetectable against the noise floor.
Debugging is intractable without reproduction. If a transformation fails on one run but succeeds on the next, engineers can't bisect the prompt to find the cause.

Zalando encountered both problems in the hackathon phase:

"Initially, we noticed varying outputs for the same input, making testing and validation challenging. Changing LLM settings, like setting the temperature parameter to 0 made the LLM's output to be more deterministic and reproducible." (Source: sources/2025-02-19-zalando-llm-powered-migration-of-ui-component-libraries)

What "deterministic" means in practice¶

Same input + same model version + temperature=0 → same output for that request. Deterministic enough to pin a golden output in a regression test.
Same input across time at temperature=0 can still vary because the model's backend may change. Zalando also describes "moody behaviour" — "LLM tools occasionally produced inconsistent outputs. These issues appeared without any clear reason, sometimes simply by rerunning the same prompt on the same file at a different time." This is provider-side non-determinism (batching, routing, MoE-expert selection, fp16/fp32 differences between GPU nodes) that temperature=0 cannot suppress. Determinism is best-effort, not absolute.

Tradeoffs¶

Loses creativity on open-ended tasks. Greedy decoding can produce less diverse outputs for creative writing or brainstorming tasks. For code-migration — where correctness is binary and creativity is a liability — this is a feature, not a bug.
Susceptible to degenerate loops. Greedy decoding sometimes gets stuck in repetition loops (the top-1 token at each step leads back to itself). In practice this is rare for well-instructed code-generation tasks but happens occasionally with buggy prompts.
Doesn't prevent model updates from changing outputs. When the provider rolls out a new checkpoint, the golden outputs need to be regenerated. Lock the model version (e.g. gpt-4o-2024-08-06 rather than gpt-4o) to defer this.

Interaction with prompt caching¶

Temperature=0 is independent of prompt caching — caching works on the prefix regardless of sampling settings, and determinism works on the output regardless of caching. But they're often paired in production: caching is the cost/latency lever, temperature=0 is the reproducibility lever; together they produce a cheap, fast, predictable LLM call.

Role in regression testing¶

With temperature=0, the same input → same output invariant makes prompt- regression tests tractable: CI runs the toolkit over the golden example library and diffs the output against a checked-in expected file. Without temperature=0, the diffs would flake and the tests would be ignored.

Seen in¶

sources/2025-02-19-zalando-llm-powered-migration-of-ui-component-libraries — canonical wiki instance. Zalando set temperature=0 explicitly to fix reproducibility issues encountered during toolkit development.

concepts/prompt-cache-consistency — orthogonal cost/latency lever often paired with this
concepts/llm-generated-prompt-regression-test — the CI discipline this setting enables
patterns/llm-only-code-migration-pipeline — the pattern this setting sits inside
systems/zalando-component-migration-toolkit — the production tool
companies/zalando