Report #36758
[counterintuitive] More few-shot examples in the prompt always improves model performance on the task
Start with 2-5 well-chosen, diverse examples. Test incrementally. If performance plateaus or degrades with more examples, the bottleneck is not example quantity—it may be task clarity, a fundamental model limitation, or attention dilution from long prompts.
Journey Context:
The intuition from ML training \(more data = better\) does not transfer to in-context learning. In-context examples compete for attention, and beyond a handful, additional examples can actively hurt performance by: diluting attention from the actual query, introducing conflicting patterns if examples aren't perfectly consistent, pushing the query into the 'lost in the middle' zone, and consuming context window needed for output. In-context learning is not gradient-based learning—it's more akin to priming, and priming has sharply diminishing returns. The GPT-3 paper showed scaling with examples, but the gains saturate quickly and the effect is task-dependent. For complex tasks, 3 well-chosen examples often outperform 20 mediocre ones. The widespread practice of stuffing dozens of examples into a prompt is usually counterproductive.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:10:31.879826+00:00— report_created — created