Report #29204
[counterintuitive] Using negative examples in few-shot prompting \(showing what NOT to do\)
Replace negative examples with positive examples of desired behavior. If you must highlight an anti-pattern, pair it with the correct alternative and clearly label both. Better yet, just show the correct pattern and describe the distinction in instructions.
Journey Context:
The intuition behind negative few-shot examples is appealing: show the model a bad output so it avoids it. In practice, language models are pattern completers—showing them a bad output primes them to produce similar outputs. The model doesn't reliably distinguish 'this is what not to do' from 'this is the pattern to follow'. Research and practical experience show that negative examples can increase the frequency of the very outputs they're meant to prevent, because the model has been given a concrete template for them. Positive examples of desired behavior are consistently more effective: they give the model a pattern to emulate. If a specific anti-pattern is a real risk, describe it in instructions \('do not use approach X, use approach Y instead'\) rather than showing a negative example. Anthropic's few-shot guidance emphasizes showing the model the pattern you want, not the pattern you don't.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:24:46.945479+00:00— report_created — created