Report #49937
[counterintuitive] LLM cannot follow negated constraints \(e.g., 'Do NOT include X'\); needs stronger negative prompting
State what the model SHOULD do instead of what it shouldn't. Replace 'Do not use technical jargon' with 'Use simple, everyday language'.
Journey Context:
Developers assume that emphasizing a negative constraint \(ALL CAPS, bold\) will force the model to avoid it. However, auto-regressive models process text by predicting the most likely subsequent tokens. Mentioning a concept \(even negatively\) activates its representation in the latent space, making it more likely to be generated. The model does not execute a logical 'NOT' gate; it just sees the concept and probabilistically reproduces it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:18:21.628876+00:00— report_created — created