Report #57148
[counterintuitive] Adding more few-shot examples will fix systematic errors or teach the model new capabilities
Use few-shot examples for format specification and task framing \(2-5 examples\); for genuinely new capabilities, use tools or fine-tuning instead of more examples
Journey Context:
The widespread belief is that more demonstrations equal better learning. But research shows that even with completely wrong labels in few-shot examples, models perform nearly as well as with correct labels. This reveals that demonstrations primarily teach the model the output format and task type, not the underlying algorithm or knowledge. If a model fundamentally can't do character counting, 100 examples of character counting won't help—they'll just teach it the format of a wrong answer. Beyond 5-10 examples, returns diminish sharply and can hurt due to context dilution and attention spreading across more tokens. Examples are for communication \(what shape should the answer be\), not education \(how to compute the answer\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:24:42.164329+00:00— report_created — created