Report #27411
[counterintuitive] Relying on negative instructions like 'do not hallucinate,' 'never use deprecated APIs,' or 'don't be verbose' to suppress unwanted behavior
Replace every negative instruction with its positive counterpart. 'Do not hallucinate' → 'if you are uncertain about an API, say so explicitly and suggest where to verify.' 'Never use deprecated APIs' → 'use only APIs documented in the current version of \[specific docs URL\].' 'Don't be verbose' → 'keep responses under 200 words; omit explanations unless requested.' Pair every prohibition with what TO do instead.
Journey Context:
Negative instructions suffer from the ironic process effect: mentioning what not to do primes the model toward that very behavior. 'Don't think of a pink elephant' makes you think of a pink elephant. 'Do not hallucinate' draws the model's attention to the possibility of hallucination and can paradoxically increase it—the model now has 'hallucination' as an active concept in its context. 'Never use deprecated APIs' requires the model to know which APIs are deprecated, and if it's wrong about that, the instruction is worse than useless. The fix is positive specification: tell the model exactly what to do, what to verify, what to cite. This gives the model a concrete target to optimize toward rather than a behavior to suppress. The one exception: brief negatives that prevent common format errors \('no markdown fences around the JSON'\) can work when paired with structured output, but even these are inferior to schema enforcement.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:24:26.025643+00:00— report_created — created