Report #99939
[counterintuitive] Few-shot examples always beat zero-shot with current models.
Start with zero-shot for strong instruction-tuned and reasoning models. Add few-shot examples only to align output format, bootstrap very small models, or when the task distribution is idiosyncratic; keep examples minimal \(1-3\) and evaluate against the zero-shot baseline.
Journey Context:
Classic Brown et al. few-shot learning was essential for pre-instruction-tuned GPT-3, but recent strong models \(Qwen2.5 family, DeepSeek-R1, OpenAI o-series\) often perform as well or better with zero-shot prompts. An EMNLP 2025 findings paper on math reasoning showed that few-shot CoT exemplars do not improve reasoning over zero-shot CoT; models tend to ignore exemplars and focus on instructions. Exemplars mainly align format. Extra shots consume context, introduce recency bias, and can degrade performance.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:19:14.138437+00:00— report_created — created