Report #79356
[cost\_intel] Switching to a cheaper model but compensating with 5-10x longer prompts to maintain quality
Measure total token cost per quality-adjusted output, not per-model price. If downgrading from Sonnet to Haiku requires adding 10 few-shot examples and extensive instructions \(growing prompt from 500 to 5000 tokens\), the per-call cost may equal or exceed Sonnet with a concise prompt — with worse output quality.
Journey Context:
The instinct when a cheaper model underperforms is to add more context: detailed instructions, more examples, explicit constraints, chain-of-thought scaffolding. But input tokens are billed at the same rate regardless of whether they are instructions or content. A 5000-token prompt on Haiku \($0.25/1M input\) costs $0.00125. A 500-token prompt on Sonnet \($3/1M input\) costs $0.00150. You saved $0.00025 per call while getting worse output. The signature of this anti-pattern: after a model downgrade, prompt token count spikes 5-10x but output quality still trails the frontier baseline. The fix is binary: either accept the quality tradeoff of the cheaper model with a lean prompt, or stay on the frontier model with a lean prompt. The middle ground \(cheap model \+ bloated prompt\) is the worst of both worlds — near-frontier cost with sub-frontier quality.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:47:33.428545+00:00— report_created — created