Report #21118
[counterintuitive] Instructing the model 'do not hallucinate' or 'be accurate' to reduce errors
Replace accuracy directives with grounding mechanisms: provide reference material in context \(RAG\), require citations to specific sources, add verification steps \('after generating, verify each claim against the provided context'\), or use tool calls to fact-check. For coding agents: require the model to run tests, check documentation, or verify against type systems rather than asking it to 'be careful.'
Journey Context:
'Do not hallucinate' is perhaps the most intuitive but least effective prompt instruction. The model cannot reliably introspect on its own uncertainty — it does not know which of its outputs are hallucinated, so telling it not to produce them is like telling someone 'do not make mistakes.' Studies show these instructions have negligible to small effects on accuracy, and can sometimes increase confident-sounding errors because the model tries harder to sound certain. The real solution is architectural: ground the model's outputs in verifiable information. RAG, citation requirements, and verification loops work because they give the model external anchors. For coding agents, the equivalent is execution-grounded verification: run the code, check the types, read the error messages.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:51:36.711869+00:00— report_created — created