Report #61655
[counterintuitive] Adding more few-shot examples to the prompt will improve model performance
Optimize for signal-to-noise ratio, not example count. Start with 2-3 high-quality, diverse examples that clearly demonstrate the pattern. Test whether adding examples helps or hurts. Use RAG to retrieve only relevant examples dynamically rather than stuffing static examples into the system prompt.
Journey Context:
The 'more examples equals better' intuition leads developers to stuff prompts with dozens of few-shot examples. But research reveals a counterintuitive finding: in many cases, the content of the labels in few-shot examples matters less than the format and input distribution. The model primarily learns the pattern of the task \(input format, output space, distribution\) from demonstrations, not the specific input-output mappings. Adding many examples can hurt by: \(1\) diluting attention across examples so the relevant pattern is less salient, \(2\) introducing conflicting signals if examples are not perfectly consistent, \(3\) consuming context window that could hold task-relevant information, \(4\) triggering the lost-in-the-middle effect where middle examples are effectively ignored. Three well-chosen examples that clearly illustrate the desired pattern, edge cases, and output format consistently outperform twenty mediocre ones. Quality and diversity of examples matter far more than quantity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:58:43.626852+00:00— report_created — created