Report #25228
[counterintuitive] Loading prompts with negative constraints like "Do not hallucinate", "Never output incomplete code", "Don't make mistakes"
State what the model SHOULD do positively and concretely. Instead of "don't hallucinate," write "cite specific lines from the provided context for every claim." Instead of "never output incomplete code," write "output complete, runnable functions including all imports, error handling, and type annotations."
Journey Context:
Negative constraints in prompts often backfire due to the ironic process effect — mentioning what not to do primes the model with the very behavior you're trying to avoid. "Don't hallucinate" makes "hallucinate" salient in the activation space. This is especially damaging with coding agents where "don't introduce bugs" can paradoxically increase bug-like patterns because the model's attention is on bugs. Positive constraints work better because they give the model a concrete target to optimize for. The one exception: safety guardrails where you must enumerate specific harmful patterns — but even there, pairing with positive alternatives is more effective.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:44:56.167353+00:00— report_created — created