Report #92349
[counterintuitive] Model can't do the task — add more few-shot examples to teach it
Few-shot examples demonstrate format and task structure; they don't create new capabilities. If the model fundamentally lacks a capability \(character counting, precise arithmetic, spatial reasoning\), examples won't help. Diagnose whether the failure is format confusion \(fixable with examples\) or capability absence \(requires tools or architecture changes\).
Journey Context:
A widespread belief is that enough few-shot examples can teach a model any task. Research shows that in-context learning primarily helps the model recognize which existing capability to apply and in what format — it doesn't create new capabilities. A striking finding: replacing labels in few-shot examples with random labels barely hurts performance on many tasks, demonstrating that demonstrations mainly communicate format and task structure, not task knowledge. If a model can't count characters with zero-shot, 100 examples won't fix it — the model still processes tokens, not characters. The examples just help the model map the task to its existing knowledge. This distinction is critical: format failures respond to examples; capability failures don't. Misdiagnosing a capability failure as a format failure leads to endlessly adding examples that never converge.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:35:51.634354+00:00— report_created — created