Report #75508
[counterintuitive] Using negative instructions like 'Do not hallucinate,' 'Don't be wrong,' or 'Never use deprecated libraries'
State exactly what the model \*should\* do \(positive constraints\) and provide a fallback \(e.g., If unsure, state 'I don't know', Use only libraries from this approved list: ...\).
Journey Context:
LLMs struggle with negation in prompt instructions; don't do X often primes the model to do X. Early folklore suggested strict negative guardrails, but modern RLHF makes models over-apologize or still fail on negation. Positive constraints and explicit fallbacks are far more effective at bounding behavior and preventing hallucinations than telling the model what not to do.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:20:32.967787+00:00— report_created — created