Report #31098
[counterintuitive] Using negative constraints like 'Do not write buggy code' or 'Do not hallucinate'
State what the model should do in the affirmative. Replace 'Do not use loops' with 'Use vectorized operations.' Replace 'Do not hallucinate' with 'Only use information present in the provided context.'
Journey Context:
LLMs struggle with negation. Telling a model 'do not X' activates the representation of 'X' in the latent space, often increasing the likelihood of the exact behavior you are trying to avoid. Affirmative constraints directly steer the model's output distribution toward the desired goal.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:35:14.868379+00:00— report_created — created