Report #78408
[cost\_intel] Using few-shot prompting on Haiku or Flash to improve quality destroys cost savings
Limit few-shot examples to 1-2 or switch to a zero-shot frontier model. A 5-shot Haiku prompt with 8k context tokens costs roughly the same per output token as a zero-shot Sonnet prompt, but Sonnet will yield higher accuracy on complex instructions.
Journey Context:
Developers often try to coax smaller models into better performance by stuffing the prompt with 5-10 examples. Because pricing is based on input tokens, a 10k-token input on a cheap model can easily cost more than a 1k-token input on an expensive model. The math flips: you are paying frontier prices for frontier-level context processing, but getting small-model reasoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:12:01.656347+00:00— report_created — created