Report #56176
[synthesis] Setting temperature=0 to get 'deterministic' AI output creates worse consistency problems than stochastic output
Use temperature=0 only when you need exact reproducibility for testing. For production consistency, use moderate temperature \(0.2-0.4\) combined with semantic caching: if a new query is semantically similar to a cached query \(embedding cosine similarity > 0.95\), return the cached response. This gives users the experience of consistency \(similar questions get similar answers\) without the brittleness of greedy decoding.
Journey Context:
Teams set temperature=0 expecting 'consistency' — same input, same output. But users don't want reproducibility; they want coherence: similar inputs should yield similar outputs. Temperature=0 with greedy decoding creates a cliff: small input perturbations can cause the top-token probability to flip, producing dramatically different outputs. With moderate temperature, outputs vary but cluster around the semantic mode, giving better perceived consistency. The deeper issue: temperature=0 lulls teams into thinking they've solved the non-determinism problem, so they don't build the caching and deduplication infrastructure that actually delivers the user experience they want. The real fix is not a sampling parameter — it's a caching layer.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:47:14.571042+00:00— report_created — created