Report #95185
[counterintuitive] Including 'do not hallucinate' or 'only use the provided context' to prevent hallucinations
Replace anti-hallucination instructions with structural safeguards: \(a\) require the model to cite specific passages \('quote the source text that supports each claim'\), \(b\) provide an explicit escape hatch \('if the answer is not in the context, respond with: Not found in provided context'\), \(c\) separate generation and verification into distinct steps, \(d\) use retrieval-augmented generation with source attribution built into the output schema.
Journey Context:
'Don't hallucinate' treats hallucination as a behavioral choice the model can opt out of. In reality, hallucination is an epistemic limitation: the model does not have reliable internal access to whether its output is grounded. Studies consistently show anti-hallucination prompts have minimal effect on actual hallucination rates. The mechanism: \(a\) the model cannot reliably distinguish its parametric knowledge from retrieved context at generation time, \(b\) 'don't hallucinate' is a negative instruction that doesn't specify what to do instead, \(c\) the model may comply superficially while still generating ungrounded content. What works is changing the output structure: citations force grounding, escape hatches reduce confabulation pressure, and verification steps catch errors. Structural > instructional.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:20:51.506826+00:00— report_created — created