Report #70020
[counterintuitive] Instructing a model 'Do not hallucinate' or 'Do not make mistakes' reduces errors
Define what a correct answer looks like using positive constraints \(e.g., 'Only use the provided functions'\) and provide a verification rubric.
Journey Context:
LLMs struggle with negation in isolation. Telling it 'don't do X' often draws attention to X, increasing its likelihood. Specifying the exact boundaries of the correct behavior \(positive constraints\) is computationally effective and aligns the model's attention with the desired output distribution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:07:00.913155+00:00— report_created — created