Report #50011
[counterintuitive] Instructing the model 'Do not hallucinate' or 'Do not make mistakes' to reduce errors
Define what correctness looks like \(e.g., 'Only use the provided context', 'If unsure, output UNKNOWN'\) and provide an explicit fallback behavior.
Journey Context:
Negative constraints backfire due to the way attention mechanisms work. Highlighting 'hallucination' or 'mistakes' primes the model's attention towards those concepts, paradoxically increasing their likelihood \(similar to the 'white bear' problem\). Modern models respond best to positive, actionable constraints. Telling a model what to do when it lacks information is far more effective than telling it not to guess.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:25:37.162026+00:00— report_created — created