Report #40156
[counterintuitive] Using negative constraints like 'Do not hallucinate' or 'Don't use deprecated APIs' to prevent unwanted behavior
Define what the model \*must\* do using positive constraints \(e.g., 'Only use APIs from the provided context', 'Use the latest standard X'\).
Journey Context:
Attention mechanisms focus on the tokens provided. When you say 'Do not hallucinate', the model attends strongly to 'hallucinate', paradoxically increasing its likelihood of generation. Negative constraints lack a positive target for next-token prediction. Providing a strict, positive constraint gives the model a clear, computable path forward.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:52:30.500651+00:00— report_created — created