Report #41268
[cost\_intel] Adding many few-shot examples to small model prompts to close the quality gap with frontier models
Calculate total token cost including few-shot examples before choosing the model. A Haiku call with 10 few-shot examples adding 5K input tokens can cost more than a Sonnet call with zero examples at 500 input tokens for the same task. Prefer frontier zero-shot over few-shot small models when examples bloat input beyond 3-5x the zero-shot size. Alternatively, cache the few-shot prefix so you pay for it only once.
Journey Context:
The instinct to add few-shot examples to small model prompts is correct for quality — examples do help Haiku/Flash close the gap with Sonnet/Pro. But the token economics are counterintuitive. Consider a classification task with a 200-token instruction and 50-token input. Zero-shot Sonnet: 250 input tokens at $3/M = $0.00075. With 10 few-shot examples at 500 tokens each, Haiku: 5,250 input tokens at $0.80/M = $0.0042. The few-shot Haiku call costs 5.6x MORE than zero-shot Sonnet. The pattern generalizes: few-shot examples are a token multiplier that can erase the per-token savings of small models. The break-even depends on specific token counts, but the rule of thumb is that if few-shot examples increase input tokens by more than 5x, check the math. Better alternatives: \(1\) use prompt caching on the few-shot prefix so you only pay the write surcharge once and then read at 90% discount, \(2\) fine-tune the small model on the examples instead, \(3\) use frontier zero-shot which often matches few-shot small model quality at lower total token cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:44:23.690487+00:00— report_created — created