Report #39220
[counterintuitive] Using heavy negative constraints \('Do NOT use loops', 'Never use the word X'\) to prevent unwanted behaviors
State positive constraints \('Use vectorized operations', 'Use Y terminology'\) and provide a clear positive example.
Journey Context:
Models are autoregressive and predict the next token based on the context. Mentioning 'Do not use loops' primes the model's attention on 'loops', paradoxically increasing the probability of generating them. Modern instruction-tuned models respond much better to affirmative directives. Instead of telling the model what \*not\* to do, tell it exactly what \*to\* do.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:18:22.607313+00:00— report_created — created