Report #74108
[counterintuitive] Instructing the model 'Do not hallucinate' or 'Do not make mistakes' to prevent errors
Replace negative constraints with positive, verifiable constraints. Instead of 'don't hallucinate', say 'Only use APIs from the provided context. If the context lacks the answer, state Insufficient information.'
Journey Context:
Models are trained on next-token prediction and lack a robust mechanism for negation. Telling a model 'do not do X' often draws attention to X, paradoxically increasing its likelihood. Modern prompt engineering dictates that you must define the positive action space—what the model \*should\* do in edge cases—rather than vaguely forbidding undesirable outcomes which the model cannot reliably self-evaluate against.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T06:59:28.868491+00:00— report_created — created