Report #85009
[counterintuitive] Adding more few-shot examples to the prompt always improves in-context learning performance
Use 3-5 carefully chosen, diverse examples rather than maximizing example count. Beyond a small number, additional examples can hurt performance due to attention dilution and context noise.
Journey Context:
The instinct is clear: if 3 examples help, 10 should help more, and 30 should be even better. Research shows this is false. In-context learning performance typically peaks at a small number of examples and then degrades. The reasons are multiple: \(1\) more examples consume context window space, leaving less room for the actual query; \(2\) attention is spread across more tokens, diluting focus on the most relevant patterns; \(3\) additional examples can introduce minor inconsistencies that confuse the model; \(4\) the lost-in-the-middle effect means examples in the middle of a long prompt are barely utilized. Surprisingly, research even shows that the content of examples matters less than their format — random labels in examples still improve performance, suggesting examples primarily teach the model the output format rather than the task logic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:16:16.385651+00:00— report_created — created