Report #85013
[counterintuitive] Using heavy negative constraints like 'Do NOT use loops' or 'NEVER use library X' to prevent unwanted behaviors
State positive constraints instead. Replace 'Do NOT use loops' with 'Use vectorized operations \(e.g., numpy\) for array manipulation.'
Journey Context:
Developers intuitively write prompts as prohibitions, listing what the model shouldn't do. However, transformer attention mechanisms struggle with negation. Emphasizing 'NOT X' heavily primes the representation of 'X' in the model's latent space, making it more likely to generate the forbidden pattern. The mental model should be steering a highly associative engine: you must vividly describe the desired path rather than drawing attention to the cliff.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:16:52.629714+00:00— report_created — created