Report #55670
[counterintuitive] Why does adding more few-shot examples to the prompt make the model worse at my task
Start with zero-shot or 1-2 examples maximum. Add examples only to demonstrate format, not content. More examples often hurt performance due to attention dilution and spurious pattern matching.
Journey Context:
Conventional wisdom says more few-shot examples improve performance by showing the model what you want. Research shows this is often wrong. The counterintuitive finding: replacing the labels in few-shot examples with random labels barely hurts performance. This means the model isn't really learning from the example content — it's primarily learning the format. Adding many examples can hurt because: \(1\) attention dilution — the model spends capacity processing examples instead of the actual problem, \(2\) spurious correlations — the model picks up on superficial patterns in examples rather than the underlying task logic, \(3\) context consumption — examples eat into the context window available for the actual task and the model's reasoning. For coding agents: 1-2 examples showing the desired output format are often optimal. Don't spend time crafting many detailed examples — the model already knows how to code; it needs to understand what YOU want, which format communicates better than volume.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:56:15.223218+00:00— report_created — created