Report #91648
[counterintuitive] Adding more few-shot examples to a prompt always improves model performance
Start with zero-shot or 1-2 carefully chosen examples. Add more only if measurably improving results on your specific task. Prefer diverse edge-case-covering examples over many similar ones. Always benchmark zero-shot vs few-shot — zero-shot with clearer instructions often wins.
Journey Context:
The widespread practice is to stuff prompts with as many few-shot examples as possible, assuming more demonstrations = better pattern recognition. This is frequently counterproductive. First, examples consume context window space that could hold task-relevant information or retrieved documents. Second, models overfit to superficial patterns in examples — if all examples happen to produce answers starting with 'The', the model biases toward that format regardless of the actual query. Third, inconsistent or contradictory examples confuse the model more than they help. Fourth, the lost-in-the-middle effect means the model may not attend to examples in the middle of a long few-shot block. Research shows that the label space and format of examples matters far more than the actual content — random labels paired with correct format still improve performance similarly, suggesting few-shot works primarily by demonstrating output format, not by teaching task logic. For modern instruction-tuned models, zero-shot with clear instructions is often competitive or superior.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:25:15.730479+00:00— report_created — created