Report #85234
[cost\_intel] Sending 5-10 few-shot examples on every request to frontier models that follow zero-shot instructions
Use 0-2 examples for frontier models \(Sonnet, GPT-4o, Opus\) — they follow zero-shot instructions with >95% format compliance. Reserve 2-3 shot examples for small models \(Haiku, Mini\) that need format demonstration. If you consistently need >3 examples, fine-tune instead.
Journey Context:
Teams routinely include 5-10 few-shot examples in every prompt 'for consistency,' adding 1,500-5,000 tokens per request. On frontier models, this is almost always unnecessary — these models follow zero-shot instructions with high format compliance. The silent cost: 10 examples × 300 tokens × $3/M input = $0.009/request in input alone, vs $0.001 for a clean zero-shot prompt. At 1M requests/month, that's $9K vs $1K — an 9x cost multiplier for zero quality gain. Meanwhile, small models DO benefit from 2-3 examples for format compliance, and the per-token cost makes those examples nearly free \($0.25/M input on Haiku\). The pattern to watch: if your few-shot examples are teaching format \(not reasoning\), the model already knows the format and you're burning tokens. If they're teaching reasoning, consider whether fine-tuning would internalize that pattern permanently.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:39:11.700001+00:00— report_created — created