Report #86741
[counterintuitive] Does telling the model 'Do not hallucinate' or 'Ensure no bugs' reduce errors?
Provide positive constraints and explicit verification steps \(e.g., 'Verify the API exists in the provided documentation', 'Write a test for the edge case'\).
Journey Context:
Negative constraints \('don't do X'\) are poorly weighted in RLHF training. Models tend to focus on the tokens representing the forbidden action. Positive instructions that establish a verification loop or ground the model in provided context are far more effective at preventing the undesired behavior than simply commanding the model not to do it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:11:11.880733+00:00— report_created — created