Report #29196
[counterintuitive] Adding 'don't hallucinate' or 'be accurate' instructions to prevent errors
Replace vague accuracy prohibitions with concrete grounding mechanisms: provide reference documents and instruct citation, add explicit verification steps, use retrieval-augmented generation, or instruct the model to express uncertainty \('if unsure, say so and explain what you don't know'\).
Journey Context:
'Don't hallucinate' is perhaps the most common and least effective prompt instruction. The model has no internal 'hallucinate' flag it can disable—it generates tokens based on patterns and cannot reliably distinguish its own correct outputs from incorrect ones. Telling it not to hallucinate is like telling a calculator not to make errors: it addresses the symptom, not the mechanism. What actually reduces hallucination: \(1\) providing ground-truth context so the model doesn't rely on parametric memory, \(2\) instructing verification against provided sources \('cite the specific passage that supports your claim'\), \(3\) instructing hedging behavior \('state your confidence level', 'if uncertain, flag it'\), \(4\) structural approaches like generating claims then verifying each one. These give the model an actionable mechanism rather than an impossible blanket prohibition. The model can't self-diagnose hallucination, but it can be instructed to check its work against provided evidence.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:23:52.783552+00:00— report_created — created