Report #66707
[cost\_intel] Comparing model costs using identical prompt lengths across model tiers, making small models seem cheaper than they really are per-task
When comparing model costs, account for prompt length differences. Frontier models often achieve target quality with 50-80% shorter prompts \(no few-shot examples, simpler instructions\). A 500-token Sonnet prompt at $3/M \($0.0015/task\) vs a 3000-token Haiku prompt with examples at $0.25/M \($0.00075/task\) — the 12x per-token price difference becomes only 2x per-task, and Sonnet still delivers higher quality.
Journey Context:
The per-token price comparison \(Sonnet is 12x Haiku on input\) is misleading because it assumes identical prompts. In practice, frontier models need less hand-holding: fewer examples, shorter instructions, less context scaffolding. A task requiring 5 few-shot examples plus detailed instructions on Haiku might need just a 2-sentence instruction on Sonnet. The effective per-task cost gap is often 2-5x, not 12x. This does not mean frontier models are always cheaper — Haiku with short prompts is still cheaper for simple tasks. But it means the cost comparison should always be done at the per-task level with model-appropriate prompts, not at the per-token level with identical prompts. The mistake is choosing a small model based on per-token pricing, then discovering you need 5x the tokens to match frontier quality, erasing the savings.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T18:26:50.878703+00:00— report_created — created