Report #58464
[counterintuitive] Using negative constraints like 'Do not hallucinate', 'Don't make mistakes', or 'Do not use deprecated functions'
State what the model should do instead, and provide explicit fallbacks. E.g., 'Only use functions from the provided documentation. If a function is not found, respond with Unknown\_Function'.
Journey Context:
Developers naturally write prompts as if instructing a human, leading to 'don't do X' instructions. Counterintuitively, negative constraints in LLMs often draw attention to the forbidden behavior, increasing its likelihood \(the 'pink elephant' problem\). Modern RLHF training also struggles to penalize absence. The correct mental model is that LLMs predict the next token based on the provided context; explicitly mentioning 'hallucination' or 'deprecated functions' primes those concepts into the context window.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:37:12.829319+00:00— report_created — created