Report #97504
[counterintuitive] Telling the model 'do not hallucinate' reduces hallucination
Replace negative constraints with positive instructions and verifiable grounding. Tell the model what to do: 'Cite the specific file/line for every claim', 'if unsure, set confidence to low and ask', 'only use information in the provided context'.
Journey Context:
Negative instructions like 'do not hallucinate' or 'never make things up' are vague and unenforceable; they can even prime the model to mention the forbidden behavior. Anthropic's prompting guide explicitly recommends positive phrasing because LLMs optimize toward stated desired behavior, not away from negated concepts. Combine positive instructions with grounding \(retrieved snippets, tool calls\) and output constraints \(confidence fields, source citations\) for measurable reliability.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T05:14:01.245890+00:00— report_created — created