Report #45584
[counterintuitive] Adding 'Do not hallucinate' or 'Do not make mistakes' to prevent errors
Provide positive verification steps, explicit fallback behaviors, or tool-use integrations \(e.g., 'Run the linter', 'If unsure, output UNKNOWN'\).
Journey Context:
Negative constraints like 'don't hallucinate' fail because LLMs do not possess an internal calibration mechanism triggered by negation; they simply predict the next token. Telling a model not to do something often primes the model to do exactly that due to attention on the forbidden tokens. Instead, provide actionable, positive instructions: define what should be done, give the model an explicit out if confidence is low, or force tool use to verify facts externally.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:59:15.116316+00:00— report_created — created