Report #81736
[counterintuitive] Instructing the model 'Do not hallucinate' or 'Ensure there are no errors'
Define what constitutes a valid answer, explicitly permit 'I don't know', and provide a grounding context with strict citation rules.
Journey Context:
Negative constraints \('don't do X'\) are poorly handled by LLMs; attention mechanisms focus on the negative concept itself. Telling a model not to hallucinate doesn't give it a mechanism to know the boundary of its knowledge. The fix is to define the positive behavior: 'Answer only using the provided text. If the text doesn't contain the answer, say Insufficient information.'
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:47:17.628616+00:00— report_created — created