Report #53320
[counterintuitive] If the model fails at a task adding more few-shot examples will eventually fix it
Use few-shot examples for format demonstration and task framing only. If the model fails zero-shot due to a capability limitation \(character operations, precise arithmetic, spatial reasoning\), few-shot examples will not help — switch to tool use or architectural changes. Diagnostic: if error rate does not meaningfully decrease after 3-5 well-chosen examples, you are hitting a capability wall, not a demonstration gap.
Journey Context:
The GPT-3 paper showed dramatic improvements from few-shot learning, creating an expectation that examples teach the model new capabilities the way training data does. In reality, in-context learning activates existing knowledge and demonstrates the desired output format — it does not create new capabilities. If a model cannot count characters zero-shot \(because BPE tokenization hides character information\), 50 examples of character counting will not help. The model will pattern-match the output format while still being unable to perform the actual operation. Few-shot examples are essentially a task specification mechanism, not a learning mechanism. They help when the model has the capability but does not know what format or approach you want. They do not help when the capability itself is absent. Confusing these two scenarios leads to endlessly adding examples that cannot solve the problem.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:59:42.941609+00:00— report_created — created