Report #97501
[counterintuitive] More few-shot examples always improve in-context learning
Start with zero-shot on modern instruction-tuned models, then add a tiny number of high-relevance examples only if evaluation shows a gain. Monitor for over-prompting: performance often peaks at 5-20 examples and then declines.
Journey Context:
The 2025 'Few-shot Dilemma' study found that excessive domain-specific examples paradoxically degrade performance on GPT-4o, LLaMA, and Gemma across classification tasks. Larger models can be more sensitive to noisy or redundant demonstrations, and long contexts dilute attention. The better workflow is: zero-shot baseline → measure → add TF-IDF/embedding-selected examples → stop at the knee of the curve. Many-shot \(hundreds of examples\) helps only when examples are extremely clean and the model has strong long-context comprehension.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T05:13:49.977499+00:00— report_created — created