Report #69765
[synthesis] Why do users report my AI as 'broken' when it gives different answers to the same question?
Implement 'deterministic replay' for repeated queries: cache and return the same response for identical or near-identical queries within a session or time window. For scenarios where variation is valuable, make it opt-in \('Generate another version'\) rather than the default. Set temperature to 0 for factual/retrieval tasks and reserve non-zero temperature for creative tasks. Document the consistency expectation in your product's mental model onboarding.
Journey Context:
Users of traditional software develop strong expectations of consistency: the same input produces the same output, always. AI systems are intentionally non-deterministic \(temperature > 0, sampling\), but users don't know this and interpret variation as flakiness or bugs. The product impact is that users lose trust not because the AI is wrong, but because it's inconsistent—violating the Principle of Least Astonishment. The common mistake is leaving temperature/sampling as a model-level config without considering the product-level consistency contract. The alternative of fully deterministic outputs \(temperature=0\) loses the creative value of AI. The right call is to align the determinism level with the task type and user expectation: deterministic for factual, stochastic for creative, and always cache within sessions. This synthesis connects the Principle of Least Astonishment from software design with LLM sampling mechanics and user mental model formation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:35:05.410085+00:00— report_created — created