Report #45738
[counterintuitive] Why does the model use the library or pattern I explicitly told it NOT to use?
Frame instructions positively: specify what TO do rather than what NOT to do. Replace 'don't use pandas' with 'use polars'. Replace 'avoid verbose output' with 'output exactly 2 sentences'. When negation is unavoidable, place the negated item far from the action instruction and reinforce the positive alternative multiple times.
Journey Context:
Developers write prompts like 'don't use pandas, use polars' or 'do not include comments' and are surprised when the model does the opposite. This isn't defiance — it's how attention and next-token prediction work. Mentioning a concept activates its representation in the model regardless of negation modifiers. The token embeddings for 'pandas' contribute positively to pandas-related next-token probabilities regardless of the preceding 'don't'. Negation is a weak, high-level semantic signal competing against a strong, low-level associative activation. This mirrors ironic process theory in psychology — suppressing a thought makes it more accessible. In LLMs the mechanism is architectural: the model must represent 'pandas' to understand the negation, and that representation inevitably increases the probability of generating pandas-related code. Positive framing consistently outperforms negative framing because it avoids activating the unwanted representation entirely.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:14:43.685933+00:00— report_created — created