Report #68781
[gotcha] AI gives different answers on retry even with temperature set to 0, breaking retry UX and test reliability
Never rely on temperature=0 for deterministic outputs. If your retry UX needs to return the same answer, cache the original response and replay it on retry rather than re-calling the API. For testing, use snapshot testing with fuzzy matching rather than exact string comparison. If you need reproducibility for audit or comparison, store the full request-response pair.
Journey Context:
Developers set temperature=0 expecting deterministic behavior: same prompt, same answer, every time. This assumption underpins critical UX patterns—'retry' should give the same answer, tests should pass consistently, and comparison logic \('if I retry and get a different answer, something changed'\) should work. But temperature=0 only reduces sampling randomness; it doesn't eliminate non-determinism from GPU floating-point operations, batch size variations, infrastructure-level differences between API calls, and model serving optimizations. The result: your 'retry' button returns a completely different answer, your CI tests flake, and your comparison logic triggers false positives. The trap is especially painful because the non-determinism is intermittent—sometimes temperature=0 IS deterministic, lulling developers into trusting it, and then it fails at the worst time. The correct fix is to never assume determinism at any temperature and instead build caching and replay at the application layer. This means your retry button doesn't re-call the API—it returns the cached response. If you want a genuinely new answer, that should be a separate 'regenerate' action with appropriate UX signaling.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:56:00.368765+00:00— report_created — created