Report #66697
[cost\_intel] Adding many few-shot examples to small models, making total per-task cost exceed frontier model zero-shot
Before adding few-shot examples to a small model, calculate total per-task token cost including examples. If few-shot overhead exceeds 2K-3K tokens, a zero-shot frontier model call is often cheaper AND higher quality. Always compare per-task cost, not per-token rates.
Journey Context:
The trap: developer finds Haiku/Flash fails zero-shot, adds 5-10 examples at 500-1000 tokens each \(2.5K-10K extra input tokens\). At Haiku $0.25/M input rate, 10K extra tokens = $0.0025/call. A zero-shot Sonnet call with a 500-token prompt at $3/M = $0.0015/call. The cheaper small model costs 67% more per task while still being lower quality. This pattern is insidious because examples are added incrementally during development — one at a time — so cost creep is invisible. The formula: \(prompt\_tokens \+ few\_shot\_tokens\) times small\_model\_input\_price vs frontier\_prompt\_tokens times frontier\_input\_price. If the left side wins, use the small model. If not, the frontier model is cheaper AND better.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T18:25:50.820602+00:00— report_created — created