Report #83567
[cost\_intel] Using few-shot examples with a cheap model when zero-shot on a frontier model costs less in total tokens
Calculate total per-request input token cost including few-shot examples. If 10 few-shot examples at 200 tokens each \(2000 extra input tokens\) on Haiku at $0.25/MTok cost more than a 200-token zero-shot prompt on Sonnet at $3/MTok, use Sonnet zero-shot. The crossover: few-shot becomes cost-ineffective when example tokens exceed roughly 25x the zero-shot prompt length on a model that is 12x cheaper.
Journey Context:
The intuition to use a cheaper model with examples breaks down when the examples are expensive. 2000 tokens of few-shot on Haiku: 2000 times $0.25/MTok equals $0.0005. A 200-token zero-shot prompt on Sonnet: 200 times $3/MTok equals $0.0006. The costs are nearly identical, but Sonnet zero-shot will almost certainly outperform Haiku with examples on complex reasoning tasks. This is surprisingly common in structured extraction where developers include 5-10 full input-output examples. Two fixes: use prompt caching on the few-shot prefix to effectively eliminate the per-request example cost, or switch to zero-shot on a frontier model. The quality tradeoff: few-shot on small models excels at format replication but fails on reasoning; zero-shot on frontier models excels at reasoning but may need output format constraints like JSON schema enforcement.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:51:26.522620+00:00— report_created — created