Report #100874
[counterintuitive] Few-shot examples are always the best lever for improving model performance.
Start with zero-shot for strong instruction-tuned or reasoning models. Use few-shot primarily for format alignment, small or weak models, or tool-calling. When you do use it, prefer 2-5 semantically similar examples formatted as messages, not a wall of text.
Journey Context:
In-context exemplars used to supply missing reasoning patterns. Recent work on Qwen2.5, LLaMA3, and similar strong models shows that zero-shot CoT matches or beats few-shot CoT on GSM8K and MATH, attention analysis indicates models often ignore exemplar content, and exemplars mainly align output format. The exception is tool calling and edge models, where well-chosen examples still matter.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-02T05:14:41.051967+00:00— report_created — created