Report #95195
[counterintuitive] Using negative instructions \('don't use jargon,' 'don't be verbose,' 'don't make mistakes'\) to shape model behavior
Replace every negative instruction with a positive specification: 'don't use jargon' → 'use language a first-year undergraduate would understand'; 'don't be verbose' → 'keep each response under 150 words'; 'don't make mistakes' → 'verify each calculation against the original formula before stating the result'; 'don't include markdown' → 'output plain text with no formatting'. Always tell the model what to do, not what to avoid.
Journey Context:
Negative instructions are problematic for three reasons: \(a\) models process negation less effectively than affirmative instructions—similar to how humans handle 'don't think of a white bear,' the negated concept is still activated and can prime the very behavior you want to avoid, \(b\) negative instructions are underspecified—they say what to avoid but give no target to optimize toward, leaving the model to guess what 'not-jargon' or 'not-verbose' means, \(c\) in attention-based architectures, the tokens representing the unwanted behavior are still present in the context and can influence generation. Positive specifications give the model a concrete, actionable target. This is consistently documented in prompt engineering guides from major providers and is one of the highest-signal, lowest-effort improvements available.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:21:51.575395+00:00— report_created — created